diff --git "a/training_log_20250123_001921.txt" "b/training_log_20250123_001921.txt" new file mode 100644--- /dev/null +++ "b/training_log_20250123_001921.txt" @@ -0,0 +1,38350 @@ +[2025-01-23 00:19:28,817] torch.distributed.run: [WARNING] +[2025-01-23 00:19:28,817] torch.distributed.run: [WARNING] ***************************************** +[2025-01-23 00:19:28,817] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +[2025-01-23 00:19:28,817] torch.distributed.run: [WARNING] ***************************************** +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +[2025-01-23 00:20:01,205] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,205] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,205] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,205] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,205] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,205] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,205] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,205] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +df: df: /root/.triton/autotune/root/.triton/autotunedf: /root/.triton/autotune: 没有那个文件或目录 +: 没有那个文件或目录 +df: /root/.triton/autotune: 没有那个文件或目录 +: 没有那个文件或目录 +df: /root/.triton/autotune: 没有那个文件或目录 + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible +[2025-01-23 00:20:19,273] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,273] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,273] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,273] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,273] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,273] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,273] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,273] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl +[2025-01-23 00:20:19,273] [INFO] [comm.py:637:init_distributed] cdb=None +01/23/2025 00:20:19 - WARNING - llava.train.train - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False +01/23/2025 00:20:19 - INFO - llava.train.train - Training/evaluation parameters TrainingArguments( +_n_gpu=1, +adafactor=False, +adam_beta1=0.9, +adam_beta2=0.999, +adam_epsilon=1e-08, +auto_find_batch_size=False, +bf16=True, +bf16_full_eval=False, +bits=16, +cache_dir=None, +data_seed=None, +dataloader_drop_last=False, +dataloader_num_workers=4, +dataloader_persistent_workers=False, +dataloader_pin_memory=True, +ddp_backend=None, +ddp_broadcast_buffers=None, +ddp_bucket_cap_mb=None, +ddp_find_unused_parameters=None, +ddp_timeout=1800, +debug=[], +deepspeed=./scripts/zero3.json, +disable_tqdm=False, +dispatch_batches=None, +do_eval=False, +do_predict=False, +do_train=False, +double_quant=True, +eval_accumulation_steps=None, +eval_delay=0, +eval_steps=None, +evaluation_strategy=no, +fp16=False, +fp16_backend=auto, +fp16_full_eval=False, +fp16_opt_level=O1, +freeze_mm_mlp_adapter=False, +fsdp=[], +fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, +fsdp_min_num_params=0, +fsdp_transformer_layer_cls_to_wrap=None, +full_determinism=False, +gradient_accumulation_steps=2, +gradient_checkpointing=True, +gradient_checkpointing_kwargs=None, +greater_is_better=None, +group_by_length=False, +group_by_modality_length=True, +half_precision_backend=auto, +hub_always_push=False, +hub_model_id=None, +hub_private_repo=False, +hub_strategy=every_save, +hub_token=, +ignore_data_skip=False, +include_inputs_for_metrics=False, +include_num_input_tokens_seen=False, +include_tokens_per_second=False, +jit_mode_eval=False, +label_names=None, +label_smoothing_factor=0.0, +learning_rate=2e-05, +length_column_name=length, +load_best_model_at_end=False, +local_rank=0, +log_level=passive, +log_level_replica=warning, +log_on_each_node=True, +logging_dir=./checkpoints/llavaAR4-qwen2_5-32b-sft-llavanext-notext-kn-infpolishmd-detail-knins40k-creationme10kfixed-chart11kmerge-tqa8k-info28kgpt/runs/Jan23_00-20-19_dlc1irjyfb0zt5ew-master-0, +logging_first_step=False, +logging_nan_inf_filter=True, +logging_steps=1.0, +logging_strategy=steps, +lora_alpha=16, +lora_bias=none, +lora_dropout=0.05, +lora_enable=False, +lora_r=64, +lora_weight_path=, +lr_scheduler_kwargs={}, +lr_scheduler_type=cosine, +max_grad_norm=1.0, +max_steps=-1, +metric_for_best_model=None, +mm_projector_lr=None, +mm_vision_tower_lr=2e-06, +model_max_length=32768, +mp_parameters=, +mpt_attn_impl=triton, +neftune_noise_alpha=None, +no_cuda=False, +num_train_epochs=1.0, +optim=adamw_torch, +optim_args=None, +output_dir=./checkpoints/llavaAR4-qwen2_5-32b-sft-llavanext-notext-kn-infpolishmd-detail-knins40k-creationme10kfixed-chart11kmerge-tqa8k-info28kgpt, +overwrite_output_dir=False, +past_index=-1, +per_device_eval_batch_size=4, +per_device_train_batch_size=1, +prediction_loss_only=False, +push_to_hub=False, +push_to_hub_model_id=None, +push_to_hub_organization=None, +push_to_hub_token=, +quant_type=nf4, +ray_scope=last, +remove_unused_columns=False, +report_to=['wandb'], +resume_from_checkpoint=None, +run_name=llavaAR4-qwen2_5-32b-sft-llavanext-notext-kn-infpolishmd-detail-knins40k-creationme10kfixed-chart11kmerge-tqa8k-info28kgpt, +save_on_each_node=False, +save_only_model=False, +save_safetensors=True, +save_steps=10000, +save_strategy=steps, +save_total_limit=1, +seed=42, +skip_memory_metrics=True, +split_batches=False, +tf32=True, +torch_compile=False, +torch_compile_backend=None, +torch_compile_mode=None, +torchdynamo=None, +tpu_metrics_debug=False, +tpu_num_cores=None, +use_cpu=False, +use_ipex=False, +use_legacy_prediction_loop=False, +use_mps_device=False, +warmup_ratio=0.03, +warmup_steps=0, +weight_decay=0.0, +) +01/23/2025 00:20:19 - INFO - llava.train.train - Training/evaluation parameters DataArguments(data_path=None, meta_path='playground/meta_json/llavanext_sample/llava_next_notext_inf37kpolishmd_de35k_know40k_knins40k_creation10kfixed_chart11kmerge_tqa8k_info28k_gpt.json', lazy_preprocess=True, is_multimodal=False, image_folder=None, image_aspect_ratio='anyres', image_grid_pinpoints='[(336, 672), (672, 336), (672, 672), (1008, 336), (336, 1008)]', image_crop_resolution=None, image_split_resolution=None, use_data_resampling=False) +[INFO|configuration_utils.py:727] 2025-01-23 00:20:19,305 >> loading configuration file models/qwen/qwen2.5-32B-Instruct/config.json +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:19,305 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +[INFO|configuration_utils.py:792] 2025-01-23 00:20:19,306 >> Model config LlavaQwenConfig { + "architectures": [ + "Qwen2ForCausalLM" + ], + "attention_dropout": 0.0, + "bos_token_id": 151643, + "eos_token_id": 151645, + "hidden_act": "silu", + "hidden_size": 5120, + "initializer_range": 0.02, + "intermediate_size": 27648, + "max_position_embeddings": 32768, + "max_window_layers": 70, + "model_type": "llava_qwen", + "num_attention_heads": 40, + "num_hidden_layers": 64, + "num_key_value_heads": 8, + "rms_norm_eps": 1e-06, + "rope_theta": 1000000.0, + "sliding_window": 131072, + "tie_word_embeddings": false, + "torch_dtype": "bfloat16", + "transformers_version": "4.37.2", + "use_cache": true, + "use_sliding_window": false, + "vocab_size": 152064 +} + +[INFO|modeling_utils.py:3473] 2025-01-23 00:20:19,308 >> loading weights file models/qwen/qwen2.5-32B-Instruct/model.safetensors.index.json +[INFO|modeling_utils.py:1426] 2025-01-23 00:20:19,310 >> Instantiating LlavaQwenForCausalLM model under default dtype torch.bfloat16. +[INFO|modeling_utils.py:3582] 2025-01-23 00:20:19,310 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:19,314 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +[INFO|configuration_utils.py:826] 2025-01-23 00:20:19,321 >> Generate config GenerationConfig { + "bos_token_id": 151643, + "eos_token_id": 151645 +} + +01/23/2025 00:20:21 - WARNING - llava.train.train - Process rank: 3, device: cuda:3, n_gpu: 1distributed training: True, 16-bits training: False +01/23/2025 00:20:21 - WARNING - llava.train.train - Process rank: 2, device: cuda:2, n_gpu: 1distributed training: True, 16-bits training: False +01/23/2025 00:20:21 - WARNING - llava.train.train - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: False +01/23/2025 00:20:21 - WARNING - llava.train.train - Process rank: 7, device: cuda:7, n_gpu: 1distributed training: True, 16-bits training: False +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:21,530 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:21,533 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:21,533 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:21,534 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:21,537 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:21,540 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:21,540 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:21,541 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +01/23/2025 00:20:21 - WARNING - llava.train.train - Process rank: 6, device: cuda:6, n_gpu: 1distributed training: True, 16-bits training: False +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:21,549 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:21,556 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +01/23/2025 00:20:21 - WARNING - llava.train.train - Process rank: 4, device: cuda:4, n_gpu: 1distributed training: True, 16-bits training: False +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:21,579 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:21,585 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +01/23/2025 00:20:21 - WARNING - llava.train.train - Process rank: 5, device: cuda:5, n_gpu: 1distributed training: True, 16-bits training: False +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:21,636 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:21,643 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +dlc1irjyfb0zt5ew-master-0:74:74 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-master-0:74:74 [0] NCCL INFO Bootstrap : Using eth0:22.8.37.86<0> +dlc1irjyfb0zt5ew-master-0:74:74 [0] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-master-0:74:74 [0] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-master-0:74:74 [0] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-master-0:74:74 [0] NCCL INFO cudaDriverVersion 12010 +NCCL version 2.18.6+cuda12.1 +dlc1irjyfb0zt5ew-master-0:81:81 [7] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-master-0:81:81 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-master-0:81:81 [7] NCCL INFO Bootstrap : Using eth0:22.8.37.86<0> +dlc1irjyfb0zt5ew-master-0:81:81 [7] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-master-0:81:81 [7] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-master-0:81:81 [7] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-master-0:76:76 [2] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-master-0:76:76 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-master-0:76:76 [2] NCCL INFO Bootstrap : Using eth0:22.8.37.86<0> +dlc1irjyfb0zt5ew-master-0:76:76 [2] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-master-0:76:76 [2] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-master-0:76:76 [2] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-master-0:78:78 [4] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-master-0:78:78 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-master-0:78:78 [4] NCCL INFO Bootstrap : Using eth0:22.8.37.86<0> +dlc1irjyfb0zt5ew-master-0:78:78 [4] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-master-0:78:78 [4] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-master-0:78:78 [4] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.37.86<0> +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-master-0:75:75 [1] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-master-0:75:75 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-master-0:75:75 [1] NCCL INFO Bootstrap : Using eth0:22.8.37.86<0> +dlc1irjyfb0zt5ew-master-0:75:75 [1] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-master-0:75:75 [1] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-master-0:75:75 [1] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.37.86<0> +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.37.86<0> +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-master-0:77:77 [3] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-master-0:77:77 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-master-0:77:77 [3] NCCL INFO Bootstrap : Using eth0:22.8.37.86<0> +dlc1irjyfb0zt5ew-master-0:77:77 [3] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-master-0:80:80 [6] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-master-0:80:80 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-master-0:80:80 [6] NCCL INFO Bootstrap : Using eth0:22.8.37.86<0> +dlc1irjyfb0zt5ew-master-0:80:80 [6] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-master-0:77:77 [3] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-master-0:77:77 [3] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-master-0:80:80 [6] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-master-0:80:80 [6] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.37.86<0> +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-master-0:79:79 [5] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-master-0:79:79 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-master-0:79:79 [5] NCCL INFO Bootstrap : Using eth0:22.8.37.86<0> +dlc1irjyfb0zt5ew-master-0:79:79 [5] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-master-0:79:79 [5] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-master-0:79:79 [5] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.37.86<0> +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.37.86<0> +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.37.86<0> +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.37.86<0> +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO comm 0x9b0b4990 rank 0 nranks 64 cudaDev 0 nvmlDev 0 busId 10 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO comm 0x9adff870 rank 6 nranks 64 cudaDev 6 nvmlDev 6 busId 70 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO comm 0x9ba4da40 rank 7 nranks 64 cudaDev 7 nvmlDev 7 busId 80 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO comm 0x9b0f78c0 rank 5 nranks 64 cudaDev 5 nvmlDev 5 busId 60 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO comm 0x9b0ba2b0 rank 3 nranks 64 cudaDev 3 nvmlDev 3 busId 40 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO comm 0x9aac3cf0 rank 2 nranks 64 cudaDev 2 nvmlDev 2 busId 30 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO comm 0x9baa6a30 rank 1 nranks 64 cudaDev 1 nvmlDev 1 busId 20 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO comm 0x9bc8c3a0 rank 4 nranks 64 cudaDev 4 nvmlDev 4 busId 50 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO Setting affinity for GPU 3 to ffffffff,ffffffff,ffffffff +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO NVLS multicast support is not available on dev 3 +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO Setting affinity for GPU 2 to ffffffff,ffffffff,ffffffff +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO NVLS multicast support is not available on dev 2 +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO NVLS multicast support is not available on dev 6 +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO Setting affinity for GPU 1 to ffffffff,ffffffff,ffffffff +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO NVLS multicast support is not available on dev 1 +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Setting affinity for GPU 0 to ffffffff,ffffffff,ffffffff +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO NVLS multicast support is not available on dev 0 +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO NVLS multicast support is not available on dev 4 +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO NVLS multicast support is not available on dev 5 +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO NVLS multicast support is not available on dev 7 +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 00/08 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9 16 23 22 21 +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 01/08 : 0 3 10 15 14 13 12 9 8 11 18 23 22 21 20 17 16 19 26 31 +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/38/-1->6->-1 [4] 7/-1/-1->6->5 [5] 7/-1/-1->6->5 [6] 7/-1/-1->6->5 [7] 7/-1/-1->6->14 +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] 0/-1/-1->7->6 [2] 0/-1/-1->7->6 [3] 0/-1/-1->7->6 [4] -1/-1/-1->7->6 [5] 0/-1/-1->7->6 [6] 0/-1/-1->7->6 [7] 0/-1/-1->7->6 +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] -1/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] -1/-1/-1->5->4 +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] -1/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/34/-1->2->-1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->10 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 02/08 : 0 7 6 5 12 11 10 9 8 15 14 13 20 19 18 17 16 23 22 21 +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] -1/-1/-1->3->2 [3] 4/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] -1/-1/-1->3->2 [7] 4/-1/-1->3->2 +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 03/08 : 0 5 4 7 14 11 10 9 8 13 12 15 22 19 18 17 16 21 20 23 +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/36/-1->4->-1 [3] 5/-1/-1->4->3 [4] 5/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->12 [7] 5/-1/-1->4->3 +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 04/08 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9 16 23 22 21 +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 05/08 : 0 3 10 15 14 13 12 9 8 11 18 23 22 21 20 17 16 19 26 31 +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 06/08 : 0 7 6 5 12 11 10 9 8 15 14 13 20 19 18 17 16 23 22 21 +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 07/08 : 0 5 4 7 14 11 10 9 8 13 12 15 22 19 18 17 16 21 20 23 +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Trees [0] 1/32/-1->0->-1 [1] 1/-1/-1->0->7 [2] 1/-1/-1->0->7 [3] 1/-1/-1->0->7 [4] 1/-1/-1->0->8 [5] 1/-1/-1->0->7 [6] 1/-1/-1->0->7 [7] 1/-1/-1->0->7 +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Channel 03/0 : 4[4] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 01/0 : 0[0] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO Channel 01/0 : 2[2] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO Channel 00/0 : 1[1] -> 8[0] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO Channel 04/0 : 1[1] -> 8[0] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO Channel 03/0 : 63[7] -> 6[6] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO Channel 07/0 : 63[7] -> 6[6] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Channel 07/0 : 4[4] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 05/0 : 0[0] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO Channel 05/0 : 2[2] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 03/0 : 0[0] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 07/0 : 0[0] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Channel 02/0 : 61[5] -> 4[4] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Channel 06/0 : 61[5] -> 4[4] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO Channel 01/0 : 3[3] -> 10[2] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO Channel 05/0 : 3[3] -> 10[2] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO Channel 01/0 : 59[3] -> 2[2] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 00/0 : 57[1] -> 0[0] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO Channel 05/0 : 59[3] -> 2[2] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 04/0 : 57[1] -> 0[0] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 00/0 : 0[0] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO Channel 02/0 : 5[5] -> 12[4] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO Channel 06/0 : 5[5] -> 12[4] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Channel 01/0 : 4[4] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 02/0 : 0[0] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Channel 05/0 : 4[4] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 04/0 : 0[0] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 06/0 : 0[0] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO Channel 03/0 : 7[7] -> 14[6] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO Channel 07/0 : 7[7] -> 14[6] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO Channel 00/0 : 7[7] -> 6[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO Channel 01/0 : 7[7] -> 6[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO Channel 02/0 : 7[7] -> 6[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO Channel 03/0 : 2[2] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO Channel 00/0 : 5[5] -> 4[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO Channel 04/0 : 7[7] -> 6[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO Channel 01/0 : 5[5] -> 4[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO Channel 05/0 : 7[7] -> 6[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO Channel 03/0 : 5[5] -> 4[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO Channel 06/0 : 7[7] -> 6[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO Channel 04/0 : 5[5] -> 4[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO Channel 03/0 : 6[6] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO Channel 05/0 : 5[5] -> 4[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO Channel 07/0 : 6[6] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO Channel 07/0 : 5[5] -> 4[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO Channel 00/0 : 3[3] -> 2[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO Channel 02/0 : 3[3] -> 2[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO Channel 00/0 : 6[6] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO Channel 03/0 : 3[3] -> 2[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Channel 00/0 : 4[4] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO Channel 01/0 : 6[6] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO Channel 04/0 : 3[3] -> 2[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Channel 02/0 : 4[4] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO Channel 02/0 : 6[6] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO Channel 06/0 : 3[3] -> 2[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Channel 04/0 : 4[4] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO Channel 04/0 : 6[6] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO Channel 07/0 : 3[3] -> 2[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Channel 06/0 : 4[4] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO Channel 05/0 : 6[6] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO Channel 06/0 : 6[6] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:363 [4] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-master-0:77:361 [3] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-master-0:78:363 [4] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-master-0:80:358 [6] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-master-0:80:358 [6] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-master-0:80:358 [6] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-master-0:80:358 [6] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-master-0:78:363 [4] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-master-0:78:363 [4] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-master-0:80:358 [6] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-master-0:78:363 [4] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-master-0:77:361 [3] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-master-0:77:361 [3] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-master-0:77:361 [3] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-master-0:77:361 [3] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-master-0:75:360 [1] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-master-0:79:362 [5] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-master-0:76:364 [2] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-master-0:76:364 [2] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-master-0:75:360 [1] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-master-0:79:362 [5] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-master-0:76:364 [2] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-master-0:76:364 [2] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-master-0:76:364 [2] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-master-0:75:360 [1] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-master-0:75:360 [1] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-master-0:79:362 [5] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-master-0:79:362 [5] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-master-0:79:362 [5] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-master-0:75:360 [1] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-master-0:81:359 [7] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-master-0:74:365 [0] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-master-0:74:365 [0] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-master-0:81:359 [7] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-master-0:74:365 [0] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-master-0:74:365 [0] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-master-0:74:365 [0] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-master-0:81:359 [7] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-master-0:81:359 [7] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-master-0:81:359 [7] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO Channel 00/0 : 6[6] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO Channel 01/0 : 6[6] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO Channel 01/0 : 3[3] -> 4[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO Channel 02/0 : 6[6] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO Channel 03/0 : 3[3] -> 4[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO Channel 03/0 : 6[6] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Channel 00/0 : 4[4] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO Channel 04/0 : 6[6] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Channel 01/0 : 4[4] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO Channel 05/0 : 3[3] -> 4[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO Channel 05/0 : 6[6] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Channel 02/0 : 4[4] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO Channel 07/0 : 3[3] -> 4[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO Channel 06/0 : 6[6] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Channel 03/0 : 4[4] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO Channel 07/0 : 6[6] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Channel 04/0 : 4[4] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Channel 05/0 : 4[4] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Channel 06/0 : 4[4] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Channel 07/0 : 4[4] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO Channel 03/0 : 2[2] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO Channel 06/0 : 5[5] -> 6[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO Channel 02/0 : 5[5] -> 4[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Channel 06/0 : 4[4] -> 12[4] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO Channel 07/0 : 6[6] -> 14[6] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Channel 02/0 : 36[4] -> 4[4] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Channel 02/0 : 4[4] -> 36[4] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Channel 06/0 : 12[4] -> 4[4] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO Channel 03/0 : 38[6] -> 6[6] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO Channel 03/0 : 6[6] -> 38[6] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO Channel 07/0 : 14[6] -> 6[6] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO Channel 06/0 : 5[5] -> 4[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 01/0 : 0[0] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 03/0 : 0[0] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 05/0 : 0[0] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Channel 01/0 : 4[4] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO Channel 05/0 : 2[2] -> 10[2] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO Channel 01/0 : 34[2] -> 2[2] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO Channel 01/0 : 2[2] -> 34[2] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO Channel 05/0 : 10[2] -> 2[2] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 07/0 : 0[0] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Channel 03/0 : 4[4] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO Channel 04/0 : 1[1] -> 0[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO Channel 01/0 : 7[7] -> 0[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Channel 05/0 : 4[4] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 04/0 : 0[0] -> 8[0] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 00/0 : 32[0] -> 0[0] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 00/0 : 0[0] -> 32[0] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Channel 04/0 : 8[0] -> 0[0] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Channel 07/0 : 4[4] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO Channel 02/0 : 7[7] -> 0[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO Channel 03/0 : 7[7] -> 0[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO Channel 01/0 : 3[3] -> 2[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO Channel 05/0 : 7[7] -> 0[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO Channel 05/0 : 3[3] -> 2[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO Channel 06/0 : 7[7] -> 0[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO Channel 07/0 : 7[7] -> 0[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO Channel 03/0 : 7[7] -> 6[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO Channel 07/0 : 7[7] -> 6[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-master-0:78:329 [4] NCCL INFO comm 0x9bc8c3a0 rank 4 nranks 64 cudaDev 4 nvmlDev 4 busId 50 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-master-0:74:318 [0] NCCL INFO comm 0x9b0b4990 rank 0 nranks 64 cudaDev 0 nvmlDev 0 busId 10 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-master-0:81:319 [7] NCCL INFO comm 0x9ba4da40 rank 7 nranks 64 cudaDev 7 nvmlDev 7 busId 80 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-master-0:75:333 [1] NCCL INFO comm 0x9baa6a30 rank 1 nranks 64 cudaDev 1 nvmlDev 1 busId 20 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-master-0:80:344 [6] NCCL INFO comm 0x9adff870 rank 6 nranks 64 cudaDev 6 nvmlDev 6 busId 70 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-master-0:77:343 [3] NCCL INFO comm 0x9b0ba2b0 rank 3 nranks 64 cudaDev 3 nvmlDev 3 busId 40 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-master-0:79:349 [5] NCCL INFO comm 0x9b0f78c0 rank 5 nranks 64 cudaDev 5 nvmlDev 5 busId 60 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-master-0:76:324 [2] NCCL INFO comm 0x9aac3cf0 rank 2 nranks 64 cudaDev 2 nvmlDev 2 busId 30 commId 0xa8b591fa54e9ef10 - Init COMPLETE +[2025-01-23 00:20:26,339] [INFO] [partition_parameters.py:345:__exit__] finished initializing model - num_params = 1542, num_elems = 65.53B + +Loading checkpoint shards: 0%| | 0/17 [00:00> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +01/23/2025 00:21:11 - WARNING - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +[WARNING|logging.py:314] 2025-01-23 00:21:11,225 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +01/23/2025 00:21:11 - WARNING - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +[WARNING|logging.py:314] 2025-01-23 00:21:11,226 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +01/23/2025 00:21:11 - WARNING - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +[WARNING|logging.py:314] 2025-01-23 00:21:11,227 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +01/23/2025 00:21:11 - WARNING - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +[WARNING|logging.py:314] 2025-01-23 00:21:11,227 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +01/23/2025 00:21:11 - WARNING - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +[WARNING|logging.py:314] 2025-01-23 00:21:11,235 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +01/23/2025 00:21:11 - WARNING - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +[WARNING|logging.py:314] 2025-01-23 00:21:11,238 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +01/23/2025 00:21:11 - WARNING - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) + +Loading checkpoint shards: 100%|██████████| 17/17 [00:46<00:00, 2.73s/it] +Loading checkpoint shards: 100%|██████████| 17/17 [00:46<00:00, 2.76s/it] +[INFO|modeling_utils.py:4350] 2025-01-23 00:21:13,298 >> All model checkpoint weights were used when initializing LlavaQwenForCausalLM. + +[INFO|modeling_utils.py:4358] 2025-01-23 00:21:13,298 >> All the weights of LlavaQwenForCausalLM were initialized from the model checkpoint at models/qwen/qwen2.5-32B-Instruct. +If your task is similar to the task the model of the checkpoint was trained on, you can already use LlavaQwenForCausalLM for predictions without further training. +[INFO|configuration_utils.py:779] 2025-01-23 00:21:13,306 >> loading configuration file models/qwen/qwen2.5-32B-Instruct/generation_config.json +[INFO|configuration_utils.py:826] 2025-01-23 00:21:13,307 >> Generate config GenerationConfig { + "attn_implementation": "flash_attention_2", + "bos_token_id": 151643, + "do_sample": true, + "eos_token_id": [ + 151645, + 151643 + ], + "pad_token_id": 151643, + "repetition_penalty": 1.05, + "temperature": 0.7, + "top_k": 20, + "top_p": 0.8 +} + +Using tokenizer from models/qwen/qwen2.5-32B-Instruct +using cache dir None +[INFO|tokenization_utils_base.py:2025] 2025-01-23 00:21:13,328 >> loading file vocab.json +[INFO|tokenization_utils_base.py:2025] 2025-01-23 00:21:13,328 >> loading file merges.txt +[INFO|tokenization_utils_base.py:2025] 2025-01-23 00:21:13,328 >> loading file added_tokens.json +[INFO|tokenization_utils_base.py:2025] 2025-01-23 00:21:13,328 >> loading file special_tokens_map.json +[INFO|tokenization_utils_base.py:2025] 2025-01-23 00:21:13,328 >> loading file tokenizer_config.json +[INFO|tokenization_utils_base.py:2025] 2025-01-23 00:21:13,328 >> loading file tokenizer.json +[WARNING|logging.py:314] 2025-01-23 00:21:13,537 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +01/23/2025 00:21:13 - INFO - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +[INFO|image_processing_utils.py:373] 2025-01-23 00:21:13,540 >> loading configuration file /fs-computility/mllm1/shared/hub/models--openai--clip-vit-large-patch14-336/snapshots/ce19dc912ca5cd21c8a653c79e251e808ccabcd1/preprocessor_config.json +[INFO|image_processing_utils.py:738] 2025-01-23 00:21:13,540 >> size should be a dictionary on of the following set of keys: ({'width', 'height'}, {'shortest_edge'}, {'shortest_edge', 'longest_edge'}, {'longest_edge'}), got 336. Converted to {'shortest_edge': 336}. +[INFO|image_processing_utils.py:738] 2025-01-23 00:21:13,540 >> crop_size should be a dictionary on of the following set of keys: ({'width', 'height'}, {'shortest_edge'}, {'shortest_edge', 'longest_edge'}, {'longest_edge'}), got 336. Converted to {'height': 336, 'width': 336}. +[INFO|image_processing_utils.py:425] 2025-01-23 00:21:13,540 >> Image processor CLIPImageProcessor { + "crop_size": { + "height": 336, + "width": 336 + }, + "do_center_crop": true, + "do_convert_rgb": true, + "do_normalize": true, + "do_rescale": true, + "do_resize": true, + "image_mean": [ + 0.48145466, + 0.4578275, + 0.40821073 + ], + "image_processor_type": "CLIPImageProcessor", + "image_std": [ + 0.26862954, + 0.26130258, + 0.27577711 + ], + "resample": 3, + "rescale_factor": 0.00392156862745098, + "size": { + "shortest_edge": 336 + } +} + +[INFO|configuration_utils.py:727] 2025-01-23 00:21:13,547 >> loading configuration file /fs-computility/mllm1/shared/hub/models--openai--clip-vit-large-patch14-336/snapshots/ce19dc912ca5cd21c8a653c79e251e808ccabcd1/config.json +[INFO|configuration_utils.py:792] 2025-01-23 00:21:13,547 >> Model config CLIPVisionConfig { + "attention_dropout": 0.0, + "dropout": 0.0, + "hidden_act": "quick_gelu", + "hidden_size": 1024, + "image_size": 336, + "initializer_factor": 1.0, + "initializer_range": 0.02, + "intermediate_size": 4096, + "layer_norm_eps": 1e-05, + "model_type": "clip_vision_model", + "num_attention_heads": 16, + "num_channels": 3, + "num_hidden_layers": 24, + "patch_size": 14, + "projection_dim": 768, + "transformers_version": "4.37.2" +} + +[INFO|modeling_utils.py:3473] 2025-01-23 00:21:13,548 >> loading weights file /fs-computility/mllm1/shared/hub/models--openai--clip-vit-large-patch14-336/snapshots/ce19dc912ca5cd21c8a653c79e251e808ccabcd1/pytorch_model.bin +[INFO|modeling_utils.py:3582] 2025-01-23 00:21:14,250 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model +[2025-01-23 00:21:14,405] [INFO] [partition_parameters.py:345:__exit__] finished initializing model - num_params = 1933, num_elems = 65.83B +[INFO|modeling_utils.py:4340] 2025-01-23 00:21:16,129 >> Some weights of the model checkpoint at /fs-computility/mllm1/shared/hub/models--openai--clip-vit-large-patch14-336/snapshots/ce19dc912ca5cd21c8a653c79e251e808ccabcd1 were not used when initializing CLIPVisionModel: ['logit_scale', 'text_model.embeddings.position_embedding.weight', 'text_model.embeddings.position_ids', 'text_model.embeddings.token_embedding.weight', 'text_model.encoder.layers.0.layer_norm1.bias', 'text_model.encoder.layers.0.layer_norm1.weight', 'text_model.encoder.layers.0.layer_norm2.bias', 'text_model.encoder.layers.0.layer_norm2.weight', 'text_model.encoder.layers.0.mlp.fc1.bias', 'text_model.encoder.layers.0.mlp.fc1.weight', 'text_model.encoder.layers.0.mlp.fc2.bias', 'text_model.encoder.layers.0.mlp.fc2.weight', 'text_model.encoder.layers.0.self_attn.k_proj.bias', 'text_model.encoder.layers.0.self_attn.k_proj.weight', 'text_model.encoder.layers.0.self_attn.out_proj.bias', 'text_model.encoder.layers.0.self_attn.out_proj.weight', 'text_model.encoder.layers.0.self_attn.q_proj.bias', 'text_model.encoder.layers.0.self_attn.q_proj.weight', 'text_model.encoder.layers.0.self_attn.v_proj.bias', 'text_model.encoder.layers.0.self_attn.v_proj.weight', 'text_model.encoder.layers.1.layer_norm1.bias', 'text_model.encoder.layers.1.layer_norm1.weight', 'text_model.encoder.layers.1.layer_norm2.bias', 'text_model.encoder.layers.1.layer_norm2.weight', 'text_model.encoder.layers.1.mlp.fc1.bias', 'text_model.encoder.layers.1.mlp.fc1.weight', 'text_model.encoder.layers.1.mlp.fc2.bias', 'text_model.encoder.layers.1.mlp.fc2.weight', 'text_model.encoder.layers.1.self_attn.k_proj.bias', 'text_model.encoder.layers.1.self_attn.k_proj.weight', 'text_model.encoder.layers.1.self_attn.out_proj.bias', 'text_model.encoder.layers.1.self_attn.out_proj.weight', 'text_model.encoder.layers.1.self_attn.q_proj.bias', 'text_model.encoder.layers.1.self_attn.q_proj.weight', 'text_model.encoder.layers.1.self_attn.v_proj.bias', 'text_model.encoder.layers.1.self_attn.v_proj.weight', 'text_model.encoder.layers.10.layer_norm1.bias', 'text_model.encoder.layers.10.layer_norm1.weight', 'text_model.encoder.layers.10.layer_norm2.bias', 'text_model.encoder.layers.10.layer_norm2.weight', 'text_model.encoder.layers.10.mlp.fc1.bias', 'text_model.encoder.layers.10.mlp.fc1.weight', 'text_model.encoder.layers.10.mlp.fc2.bias', 'text_model.encoder.layers.10.mlp.fc2.weight', 'text_model.encoder.layers.10.self_attn.k_proj.bias', 'text_model.encoder.layers.10.self_attn.k_proj.weight', 'text_model.encoder.layers.10.self_attn.out_proj.bias', 'text_model.encoder.layers.10.self_attn.out_proj.weight', 'text_model.encoder.layers.10.self_attn.q_proj.bias', 'text_model.encoder.layers.10.self_attn.q_proj.weight', 'text_model.encoder.layers.10.self_attn.v_proj.bias', 'text_model.encoder.layers.10.self_attn.v_proj.weight', 'text_model.encoder.layers.11.layer_norm1.bias', 'text_model.encoder.layers.11.layer_norm1.weight', 'text_model.encoder.layers.11.layer_norm2.bias', 'text_model.encoder.layers.11.layer_norm2.weight', 'text_model.encoder.layers.11.mlp.fc1.bias', 'text_model.encoder.layers.11.mlp.fc1.weight', 'text_model.encoder.layers.11.mlp.fc2.bias', 'text_model.encoder.layers.11.mlp.fc2.weight', 'text_model.encoder.layers.11.self_attn.k_proj.bias', 'text_model.encoder.layers.11.self_attn.k_proj.weight', 'text_model.encoder.layers.11.self_attn.out_proj.bias', 'text_model.encoder.layers.11.self_attn.out_proj.weight', 'text_model.encoder.layers.11.self_attn.q_proj.bias', 'text_model.encoder.layers.11.self_attn.q_proj.weight', 'text_model.encoder.layers.11.self_attn.v_proj.bias', 'text_model.encoder.layers.11.self_attn.v_proj.weight', 'text_model.encoder.layers.2.layer_norm1.bias', 'text_model.encoder.layers.2.layer_norm1.weight', 'text_model.encoder.layers.2.layer_norm2.bias', 'text_model.encoder.layers.2.layer_norm2.weight', 'text_model.encoder.layers.2.mlp.fc1.bias', 'text_model.encoder.layers.2.mlp.fc1.weight', 'text_model.encoder.layers.2.mlp.fc2.bias', 'text_model.encoder.layers.2.mlp.fc2.weight', 'text_model.encoder.layers.2.self_attn.k_proj.bias', 'text_model.encoder.layers.2.self_attn.k_proj.weight', 'text_model.encoder.layers.2.self_attn.out_proj.bias', 'text_model.encoder.layers.2.self_attn.out_proj.weight', 'text_model.encoder.layers.2.self_attn.q_proj.bias', 'text_model.encoder.layers.2.self_attn.q_proj.weight', 'text_model.encoder.layers.2.self_attn.v_proj.bias', 'text_model.encoder.layers.2.self_attn.v_proj.weight', 'text_model.encoder.layers.3.layer_norm1.bias', 'text_model.encoder.layers.3.layer_norm1.weight', 'text_model.encoder.layers.3.layer_norm2.bias', 'text_model.encoder.layers.3.layer_norm2.weight', 'text_model.encoder.layers.3.mlp.fc1.bias', 'text_model.encoder.layers.3.mlp.fc1.weight', 'text_model.encoder.layers.3.mlp.fc2.bias', 'text_model.encoder.layers.3.mlp.fc2.weight', 'text_model.encoder.layers.3.self_attn.k_proj.bias', 'text_model.encoder.layers.3.self_attn.k_proj.weight', 'text_model.encoder.layers.3.self_attn.out_proj.bias', 'text_model.encoder.layers.3.self_attn.out_proj.weight', 'text_model.encoder.layers.3.self_attn.q_proj.bias', 'text_model.encoder.layers.3.self_attn.q_proj.weight', 'text_model.encoder.layers.3.self_attn.v_proj.bias', 'text_model.encoder.layers.3.self_attn.v_proj.weight', 'text_model.encoder.layers.4.layer_norm1.bias', 'text_model.encoder.layers.4.layer_norm1.weight', 'text_model.encoder.layers.4.layer_norm2.bias', 'text_model.encoder.layers.4.layer_norm2.weight', 'text_model.encoder.layers.4.mlp.fc1.bias', 'text_model.encoder.layers.4.mlp.fc1.weight', 'text_model.encoder.layers.4.mlp.fc2.bias', 'text_model.encoder.layers.4.mlp.fc2.weight', 'text_model.encoder.layers.4.self_attn.k_proj.bias', 'text_model.encoder.layers.4.self_attn.k_proj.weight', 'text_model.encoder.layers.4.self_attn.out_proj.bias', 'text_model.encoder.layers.4.self_attn.out_proj.weight', 'text_model.encoder.layers.4.self_attn.q_proj.bias', 'text_model.encoder.layers.4.self_attn.q_proj.weight', 'text_model.encoder.layers.4.self_attn.v_proj.bias', 'text_model.encoder.layers.4.self_attn.v_proj.weight', 'text_model.encoder.layers.5.layer_norm1.bias', 'text_model.encoder.layers.5.layer_norm1.weight', 'text_model.encoder.layers.5.layer_norm2.bias', 'text_model.encoder.layers.5.layer_norm2.weight', 'text_model.encoder.layers.5.mlp.fc1.bias', 'text_model.encoder.layers.5.mlp.fc1.weight', 'text_model.encoder.layers.5.mlp.fc2.bias', 'text_model.encoder.layers.5.mlp.fc2.weight', 'text_model.encoder.layers.5.self_attn.k_proj.bias', 'text_model.encoder.layers.5.self_attn.k_proj.weight', 'text_model.encoder.layers.5.self_attn.out_proj.bias', 'text_model.encoder.layers.5.self_attn.out_proj.weight', 'text_model.encoder.layers.5.self_attn.q_proj.bias', 'text_model.encoder.layers.5.self_attn.q_proj.weight', 'text_model.encoder.layers.5.self_attn.v_proj.bias', 'text_model.encoder.layers.5.self_attn.v_proj.weight', 'text_model.encoder.layers.6.layer_norm1.bias', 'text_model.encoder.layers.6.layer_norm1.weight', 'text_model.encoder.layers.6.layer_norm2.bias', 'text_model.encoder.layers.6.layer_norm2.weight', 'text_model.encoder.layers.6.mlp.fc1.bias', 'text_model.encoder.layers.6.mlp.fc1.weight', 'text_model.encoder.layers.6.mlp.fc2.bias', 'text_model.encoder.layers.6.mlp.fc2.weight', 'text_model.encoder.layers.6.self_attn.k_proj.bias', 'text_model.encoder.layers.6.self_attn.k_proj.weight', 'text_model.encoder.layers.6.self_attn.out_proj.bias', 'text_model.encoder.layers.6.self_attn.out_proj.weight', 'text_model.encoder.layers.6.self_attn.q_proj.bias', 'text_model.encoder.layers.6.self_attn.q_proj.weight', 'text_model.encoder.layers.6.self_attn.v_proj.bias', 'text_model.encoder.layers.6.self_attn.v_proj.weight', 'text_model.encoder.layers.7.layer_norm1.bias', 'text_model.encoder.layers.7.layer_norm1.weight', 'text_model.encoder.layers.7.layer_norm2.bias', 'text_model.encoder.layers.7.layer_norm2.weight', 'text_model.encoder.layers.7.mlp.fc1.bias', 'text_model.encoder.layers.7.mlp.fc1.weight', 'text_model.encoder.layers.7.mlp.fc2.bias', 'text_model.encoder.layers.7.mlp.fc2.weight', 'text_model.encoder.layers.7.self_attn.k_proj.bias', 'text_model.encoder.layers.7.self_attn.k_proj.weight', 'text_model.encoder.layers.7.self_attn.out_proj.bias', 'text_model.encoder.layers.7.self_attn.out_proj.weight', 'text_model.encoder.layers.7.self_attn.q_proj.bias', 'text_model.encoder.layers.7.self_attn.q_proj.weight', 'text_model.encoder.layers.7.self_attn.v_proj.bias', 'text_model.encoder.layers.7.self_attn.v_proj.weight', 'text_model.encoder.layers.8.layer_norm1.bias', 'text_model.encoder.layers.8.layer_norm1.weight', 'text_model.encoder.layers.8.layer_norm2.bias', 'text_model.encoder.layers.8.layer_norm2.weight', 'text_model.encoder.layers.8.mlp.fc1.bias', 'text_model.encoder.layers.8.mlp.fc1.weight', 'text_model.encoder.layers.8.mlp.fc2.bias', 'text_model.encoder.layers.8.mlp.fc2.weight', 'text_model.encoder.layers.8.self_attn.k_proj.bias', 'text_model.encoder.layers.8.self_attn.k_proj.weight', 'text_model.encoder.layers.8.self_attn.out_proj.bias', 'text_model.encoder.layers.8.self_attn.out_proj.weight', 'text_model.encoder.layers.8.self_attn.q_proj.bias', 'text_model.encoder.layers.8.self_attn.q_proj.weight', 'text_model.encoder.layers.8.self_attn.v_proj.bias', 'text_model.encoder.layers.8.self_attn.v_proj.weight', 'text_model.encoder.layers.9.layer_norm1.bias', 'text_model.encoder.layers.9.layer_norm1.weight', 'text_model.encoder.layers.9.layer_norm2.bias', 'text_model.encoder.layers.9.layer_norm2.weight', 'text_model.encoder.layers.9.mlp.fc1.bias', 'text_model.encoder.layers.9.mlp.fc1.weight', 'text_model.encoder.layers.9.mlp.fc2.bias', 'text_model.encoder.layers.9.mlp.fc2.weight', 'text_model.encoder.layers.9.self_attn.k_proj.bias', 'text_model.encoder.layers.9.self_attn.k_proj.weight', 'text_model.encoder.layers.9.self_attn.out_proj.bias', 'text_model.encoder.layers.9.self_attn.out_proj.weight', 'text_model.encoder.layers.9.self_attn.q_proj.bias', 'text_model.encoder.layers.9.self_attn.q_proj.weight', 'text_model.encoder.layers.9.self_attn.v_proj.bias', 'text_model.encoder.layers.9.self_attn.v_proj.weight', 'text_model.final_layer_norm.bias', 'text_model.final_layer_norm.weight', 'text_projection.weight', 'visual_projection.weight'] +- This IS expected if you are initializing CLIPVisionModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). +- This IS NOT expected if you are initializing CLIPVisionModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). +[INFO|modeling_utils.py:4358] 2025-01-23 00:21:16,129 >> All the weights of CLIPVisionModel were initialized from the model checkpoint at /fs-computility/mllm1/shared/hub/models--openai--clip-vit-large-patch14-336/snapshots/ce19dc912ca5cd21c8a653c79e251e808ccabcd1. +If your task is similar to the task the model of the checkpoint was trained on, you can already use CLIPVisionModel for predictions without further training. +Rank 0: Loading pretrained mm_projector from ./checkpoints/llava-qwen25-32b-pretrain/mm_projector.bin +Rank 0: Using mm_tunable_parts: mm_vision_tower,mm_mlp_adapter,mm_language_model +Rank 0: Formatting inputs...Skip in lazy mode +01/23/2025 00:21:34 - INFO - llava.train.train - Add dataset: llava-next-sft-notext with length: 738601, data type: normal, seed: 0 +Rank 0: Formatting inputs...Skip in lazy mode +01/23/2025 00:21:37 - INFO - llava.train.train - Add dataset: knowledge_gqa9k_art1500_cc3m30k with length: 40813, data type: know, seed: 1 +Rank 0: Formatting inputs...Skip in lazy mode +01/23/2025 00:21:41 - INFO - llava.train.train - Add dataset: Inferencial_flickr7k_cc3m30k_polished_md with length: 37117, data type: inf_polishmd, seed: 2 +Rank 0: Formatting inputs...Skip in lazy mode +01/23/2025 00:21:44 - INFO - llava.train.train - Add dataset: Detail_flickr7k_cc3m28k with length: 35313, data type: detail, seed: 3 +Rank 0: Formatting inputs...Skip in lazy mode +01/23/2025 00:21:49 - INFO - llava.train.train - Add dataset: Knowledge_instruct40k with length: 40218, data type: know_ins, seed: 4 +Rank 0: Formatting inputs...Skip in lazy mode +01/23/2025 00:21:52 - INFO - llava.train.train - Add dataset: Creation10k_fixed with length: 9698, data type: creation, seed: 5 +Rank 0: Formatting inputs...Skip in lazy mode +01/23/2025 00:21:56 - INFO - llava.train.train - Add dataset: Chartqa_generate_11k_gpt_qwen_merge with length: 11160, data type: chart, seed: 6 +Rank 0: Formatting inputs...Skip in lazy mode +01/23/2025 00:21:59 - INFO - llava.train.train - Add dataset: Tqa_detail_qwengenerate_multi8k_gpt with length: 8391, data type: tqa, seed: 7 +Rank 0: Formatting inputs...Skip in lazy mode +01/23/2025 00:22:03 - INFO - llava.train.train - Add dataset: Infovqa_single_gpt with length: 23068, data type: info, seed: 8 +Rank 0: Trainable parameters: ['model.image_newline', 'model.embed_tokens.weight', 'model.layers.0.self_attn.q_proj.weight', 'model.layers.0.self_attn.q_proj.bias', 'model.layers.0.self_attn.k_proj.weight', 'model.layers.0.self_attn.k_proj.bias', 'model.layers.0.self_attn.v_proj.weight', 'model.layers.0.self_attn.v_proj.bias', 'model.layers.0.self_attn.o_proj.weight', 'model.layers.0.mlp.gate_proj.weight', 'model.layers.0.mlp.up_proj.weight', 'model.layers.0.mlp.down_proj.weight', 'model.layers.0.input_layernorm.weight', 'model.layers.0.post_attention_layernorm.weight', 'model.layers.1.self_attn.q_proj.weight', 'model.layers.1.self_attn.q_proj.bias', 'model.layers.1.self_attn.k_proj.weight', 'model.layers.1.self_attn.k_proj.bias', 'model.layers.1.self_attn.v_proj.weight', 'model.layers.1.self_attn.v_proj.bias', 'model.layers.1.self_attn.o_proj.weight', 'model.layers.1.mlp.gate_proj.weight', 'model.layers.1.mlp.up_proj.weight', 'model.layers.1.mlp.down_proj.weight', 'model.layers.1.input_layernorm.weight', 'model.layers.1.post_attention_layernorm.weight', 'model.layers.2.self_attn.q_proj.weight', 'model.layers.2.self_attn.q_proj.bias', 'model.layers.2.self_attn.k_proj.weight', 'model.layers.2.self_attn.k_proj.bias', 'model.layers.2.self_attn.v_proj.weight', 'model.layers.2.self_attn.v_proj.bias', 'model.layers.2.self_attn.o_proj.weight', 'model.layers.2.mlp.gate_proj.weight', 'model.layers.2.mlp.up_proj.weight', 'model.layers.2.mlp.down_proj.weight', 'model.layers.2.input_layernorm.weight', 'model.layers.2.post_attention_layernorm.weight', 'model.layers.3.self_attn.q_proj.weight', 'model.layers.3.self_attn.q_proj.bias', 'model.layers.3.self_attn.k_proj.weight', 'model.layers.3.self_attn.k_proj.bias', 'model.layers.3.self_attn.v_proj.weight', 'model.layers.3.self_attn.v_proj.bias', 'model.layers.3.self_attn.o_proj.weight', 'model.layers.3.mlp.gate_proj.weight', 'model.layers.3.mlp.up_proj.weight', 'model.layers.3.mlp.down_proj.weight', 'model.layers.3.input_layernorm.weight', 'model.layers.3.post_attention_layernorm.weight', 'model.layers.4.self_attn.q_proj.weight', 'model.layers.4.self_attn.q_proj.bias', 'model.layers.4.self_attn.k_proj.weight', 'model.layers.4.self_attn.k_proj.bias', 'model.layers.4.self_attn.v_proj.weight', 'model.layers.4.self_attn.v_proj.bias', 'model.layers.4.self_attn.o_proj.weight', 'model.layers.4.mlp.gate_proj.weight', 'model.layers.4.mlp.up_proj.weight', 'model.layers.4.mlp.down_proj.weight', 'model.layers.4.input_layernorm.weight', 'model.layers.4.post_attention_layernorm.weight', 'model.layers.5.self_attn.q_proj.weight', 'model.layers.5.self_attn.q_proj.bias', 'model.layers.5.self_attn.k_proj.weight', 'model.layers.5.self_attn.k_proj.bias', 'model.layers.5.self_attn.v_proj.weight', 'model.layers.5.self_attn.v_proj.bias', 'model.layers.5.self_attn.o_proj.weight', 'model.layers.5.mlp.gate_proj.weight', 'model.layers.5.mlp.up_proj.weight', 'model.layers.5.mlp.down_proj.weight', 'model.layers.5.input_layernorm.weight', 'model.layers.5.post_attention_layernorm.weight', 'model.layers.6.self_attn.q_proj.weight', 'model.layers.6.self_attn.q_proj.bias', 'model.layers.6.self_attn.k_proj.weight', 'model.layers.6.self_attn.k_proj.bias', 'model.layers.6.self_attn.v_proj.weight', 'model.layers.6.self_attn.v_proj.bias', 'model.layers.6.self_attn.o_proj.weight', 'model.layers.6.mlp.gate_proj.weight', 'model.layers.6.mlp.up_proj.weight', 'model.layers.6.mlp.down_proj.weight', 'model.layers.6.input_layernorm.weight', 'model.layers.6.post_attention_layernorm.weight', 'model.layers.7.self_attn.q_proj.weight', 'model.layers.7.self_attn.q_proj.bias', 'model.layers.7.self_attn.k_proj.weight', 'model.layers.7.self_attn.k_proj.bias', 'model.layers.7.self_attn.v_proj.weight', 'model.layers.7.self_attn.v_proj.bias', 'model.layers.7.self_attn.o_proj.weight', 'model.layers.7.mlp.gate_proj.weight', 'model.layers.7.mlp.up_proj.weight', 'model.layers.7.mlp.down_proj.weight', 'model.layers.7.input_layernorm.weight', 'model.layers.7.post_attention_layernorm.weight', 'model.layers.8.self_attn.q_proj.weight', 'model.layers.8.self_attn.q_proj.bias', 'model.layers.8.self_attn.k_proj.weight', 'model.layers.8.self_attn.k_proj.bias', 'model.layers.8.self_attn.v_proj.weight', 'model.layers.8.self_attn.v_proj.bias', 'model.layers.8.self_attn.o_proj.weight', 'model.layers.8.mlp.gate_proj.weight', 'model.layers.8.mlp.up_proj.weight', 'model.layers.8.mlp.down_proj.weight', 'model.layers.8.input_layernorm.weight', 'model.layers.8.post_attention_layernorm.weight', 'model.layers.9.self_attn.q_proj.weight', 'model.layers.9.self_attn.q_proj.bias', 'model.layers.9.self_attn.k_proj.weight', 'model.layers.9.self_attn.k_proj.bias', 'model.layers.9.self_attn.v_proj.weight', 'model.layers.9.self_attn.v_proj.bias', 'model.layers.9.self_attn.o_proj.weight', 'model.layers.9.mlp.gate_proj.weight', 'model.layers.9.mlp.up_proj.weight', 'model.layers.9.mlp.down_proj.weight', 'model.layers.9.input_layernorm.weight', 'model.layers.9.post_attention_layernorm.weight', 'model.layers.10.self_attn.q_proj.weight', 'model.layers.10.self_attn.q_proj.bias', 'model.layers.10.self_attn.k_proj.weight', 'model.layers.10.self_attn.k_proj.bias', 'model.layers.10.self_attn.v_proj.weight', 'model.layers.10.self_attn.v_proj.bias', 'model.layers.10.self_attn.o_proj.weight', 'model.layers.10.mlp.gate_proj.weight', 'model.layers.10.mlp.up_proj.weight', 'model.layers.10.mlp.down_proj.weight', 'model.layers.10.input_layernorm.weight', 'model.layers.10.post_attention_layernorm.weight', 'model.layers.11.self_attn.q_proj.weight', 'model.layers.11.self_attn.q_proj.bias', 'model.layers.11.self_attn.k_proj.weight', 'model.layers.11.self_attn.k_proj.bias', 'model.layers.11.self_attn.v_proj.weight', 'model.layers.11.self_attn.v_proj.bias', 'model.layers.11.self_attn.o_proj.weight', 'model.layers.11.mlp.gate_proj.weight', 'model.layers.11.mlp.up_proj.weight', 'model.layers.11.mlp.down_proj.weight', 'model.layers.11.input_layernorm.weight', 'model.layers.11.post_attention_layernorm.weight', 'model.layers.12.self_attn.q_proj.weight', 'model.layers.12.self_attn.q_proj.bias', 'model.layers.12.self_attn.k_proj.weight', 'model.layers.12.self_attn.k_proj.bias', 'model.layers.12.self_attn.v_proj.weight', 'model.layers.12.self_attn.v_proj.bias', 'model.layers.12.self_attn.o_proj.weight', 'model.layers.12.mlp.gate_proj.weight', 'model.layers.12.mlp.up_proj.weight', 'model.layers.12.mlp.down_proj.weight', 'model.layers.12.input_layernorm.weight', 'model.layers.12.post_attention_layernorm.weight', 'model.layers.13.self_attn.q_proj.weight', 'model.layers.13.self_attn.q_proj.bias', 'model.layers.13.self_attn.k_proj.weight', 'model.layers.13.self_attn.k_proj.bias', 'model.layers.13.self_attn.v_proj.weight', 'model.layers.13.self_attn.v_proj.bias', 'model.layers.13.self_attn.o_proj.weight', 'model.layers.13.mlp.gate_proj.weight', 'model.layers.13.mlp.up_proj.weight', 'model.layers.13.mlp.down_proj.weight', 'model.layers.13.input_layernorm.weight', 'model.layers.13.post_attention_layernorm.weight', 'model.layers.14.self_attn.q_proj.weight', 'model.layers.14.self_attn.q_proj.bias', 'model.layers.14.self_attn.k_proj.weight', 'model.layers.14.self_attn.k_proj.bias', 'model.layers.14.self_attn.v_proj.weight', 'model.layers.14.self_attn.v_proj.bias', 'model.layers.14.self_attn.o_proj.weight', 'model.layers.14.mlp.gate_proj.weight', 'model.layers.14.mlp.up_proj.weight', 'model.layers.14.mlp.down_proj.weight', 'model.layers.14.input_layernorm.weight', 'model.layers.14.post_attention_layernorm.weight', 'model.layers.15.self_attn.q_proj.weight', 'model.layers.15.self_attn.q_proj.bias', 'model.layers.15.self_attn.k_proj.weight', 'model.layers.15.self_attn.k_proj.bias', 'model.layers.15.self_attn.v_proj.weight', 'model.layers.15.self_attn.v_proj.bias', 'model.layers.15.self_attn.o_proj.weight', 'model.layers.15.mlp.gate_proj.weight', 'model.layers.15.mlp.up_proj.weight', 'model.layers.15.mlp.down_proj.weight', 'model.layers.15.input_layernorm.weight', 'model.layers.15.post_attention_layernorm.weight', 'model.layers.16.self_attn.q_proj.weight', 'model.layers.16.self_attn.q_proj.bias', 'model.layers.16.self_attn.k_proj.weight', 'model.layers.16.self_attn.k_proj.bias', 'model.layers.16.self_attn.v_proj.weight', 'model.layers.16.self_attn.v_proj.bias', 'model.layers.16.self_attn.o_proj.weight', 'model.layers.16.mlp.gate_proj.weight', 'model.layers.16.mlp.up_proj.weight', 'model.layers.16.mlp.down_proj.weight', 'model.layers.16.input_layernorm.weight', 'model.layers.16.post_attention_layernorm.weight', 'model.layers.17.self_attn.q_proj.weight', 'model.layers.17.self_attn.q_proj.bias', 'model.layers.17.self_attn.k_proj.weight', 'model.layers.17.self_attn.k_proj.bias', 'model.layers.17.self_attn.v_proj.weight', 'model.layers.17.self_attn.v_proj.bias', 'model.layers.17.self_attn.o_proj.weight', 'model.layers.17.mlp.gate_proj.weight', 'model.layers.17.mlp.up_proj.weight', 'model.layers.17.mlp.down_proj.weight', 'model.layers.17.input_layernorm.weight', 'model.layers.17.post_attention_layernorm.weight', 'model.layers.18.self_attn.q_proj.weight', 'model.layers.18.self_attn.q_proj.bias', 'model.layers.18.self_attn.k_proj.weight', 'model.layers.18.self_attn.k_proj.bias', 'model.layers.18.self_attn.v_proj.weight', 'model.layers.18.self_attn.v_proj.bias', 'model.layers.18.self_attn.o_proj.weight', 'model.layers.18.mlp.gate_proj.weight', 'model.layers.18.mlp.up_proj.weight', 'model.layers.18.mlp.down_proj.weight', 'model.layers.18.input_layernorm.weight', 'model.layers.18.post_attention_layernorm.weight', 'model.layers.19.self_attn.q_proj.weight', 'model.layers.19.self_attn.q_proj.bias', 'model.layers.19.self_attn.k_proj.weight', 'model.layers.19.self_attn.k_proj.bias', 'model.layers.19.self_attn.v_proj.weight', 'model.layers.19.self_attn.v_proj.bias', 'model.layers.19.self_attn.o_proj.weight', 'model.layers.19.mlp.gate_proj.weight', 'model.layers.19.mlp.up_proj.weight', 'model.layers.19.mlp.down_proj.weight', 'model.layers.19.input_layernorm.weight', 'model.layers.19.post_attention_layernorm.weight', 'model.layers.20.self_attn.q_proj.weight', 'model.layers.20.self_attn.q_proj.bias', 'model.layers.20.self_attn.k_proj.weight', 'model.layers.20.self_attn.k_proj.bias', 'model.layers.20.self_attn.v_proj.weight', 'model.layers.20.self_attn.v_proj.bias', 'model.layers.20.self_attn.o_proj.weight', 'model.layers.20.mlp.gate_proj.weight', 'model.layers.20.mlp.up_proj.weight', 'model.layers.20.mlp.down_proj.weight', 'model.layers.20.input_layernorm.weight', 'model.layers.20.post_attention_layernorm.weight', 'model.layers.21.self_attn.q_proj.weight', 'model.layers.21.self_attn.q_proj.bias', 'model.layers.21.self_attn.k_proj.weight', 'model.layers.21.self_attn.k_proj.bias', 'model.layers.21.self_attn.v_proj.weight', 'model.layers.21.self_attn.v_proj.bias', 'model.layers.21.self_attn.o_proj.weight', 'model.layers.21.mlp.gate_proj.weight', 'model.layers.21.mlp.up_proj.weight', 'model.layers.21.mlp.down_proj.weight', 'model.layers.21.input_layernorm.weight', 'model.layers.21.post_attention_layernorm.weight', 'model.layers.22.self_attn.q_proj.weight', 'model.layers.22.self_attn.q_proj.bias', 'model.layers.22.self_attn.k_proj.weight', 'model.layers.22.self_attn.k_proj.bias', 'model.layers.22.self_attn.v_proj.weight', 'model.layers.22.self_attn.v_proj.bias', 'model.layers.22.self_attn.o_proj.weight', 'model.layers.22.mlp.gate_proj.weight', 'model.layers.22.mlp.up_proj.weight', 'model.layers.22.mlp.down_proj.weight', 'model.layers.22.input_layernorm.weight', 'model.layers.22.post_attention_layernorm.weight', 'model.layers.23.self_attn.q_proj.weight', 'model.layers.23.self_attn.q_proj.bias', 'model.layers.23.self_attn.k_proj.weight', 'model.layers.23.self_attn.k_proj.bias', 'model.layers.23.self_attn.v_proj.weight', 'model.layers.23.self_attn.v_proj.bias', 'model.layers.23.self_attn.o_proj.weight', 'model.layers.23.mlp.gate_proj.weight', 'model.layers.23.mlp.up_proj.weight', 'model.layers.23.mlp.down_proj.weight', 'model.layers.23.input_layernorm.weight', 'model.layers.23.post_attention_layernorm.weight', 'model.layers.24.self_attn.q_proj.weight', 'model.layers.24.self_attn.q_proj.bias', 'model.layers.24.self_attn.k_proj.weight', 'model.layers.24.self_attn.k_proj.bias', 'model.layers.24.self_attn.v_proj.weight', 'model.layers.24.self_attn.v_proj.bias', 'model.layers.24.self_attn.o_proj.weight', 'model.layers.24.mlp.gate_proj.weight', 'model.layers.24.mlp.up_proj.weight', 'model.layers.24.mlp.down_proj.weight', 'model.layers.24.input_layernorm.weight', 'model.layers.24.post_attention_layernorm.weight', 'model.layers.25.self_attn.q_proj.weight', 'model.layers.25.self_attn.q_proj.bias', 'model.layers.25.self_attn.k_proj.weight', 'model.layers.25.self_attn.k_proj.bias', 'model.layers.25.self_attn.v_proj.weight', 'model.layers.25.self_attn.v_proj.bias', 'model.layers.25.self_attn.o_proj.weight', 'model.layers.25.mlp.gate_proj.weight', 'model.layers.25.mlp.up_proj.weight', 'model.layers.25.mlp.down_proj.weight', 'model.layers.25.input_layernorm.weight', 'model.layers.25.post_attention_layernorm.weight', 'model.layers.26.self_attn.q_proj.weight', 'model.layers.26.self_attn.q_proj.bias', 'model.layers.26.self_attn.k_proj.weight', 'model.layers.26.self_attn.k_proj.bias', 'model.layers.26.self_attn.v_proj.weight', 'model.layers.26.self_attn.v_proj.bias', 'model.layers.26.self_attn.o_proj.weight', 'model.layers.26.mlp.gate_proj.weight', 'model.layers.26.mlp.up_proj.weight', 'model.layers.26.mlp.down_proj.weight', 'model.layers.26.input_layernorm.weight', 'model.layers.26.post_attention_layernorm.weight', 'model.layers.27.self_attn.q_proj.weight', 'model.layers.27.self_attn.q_proj.bias', 'model.layers.27.self_attn.k_proj.weight', 'model.layers.27.self_attn.k_proj.bias', 'model.layers.27.self_attn.v_proj.weight', 'model.layers.27.self_attn.v_proj.bias', 'model.layers.27.self_attn.o_proj.weight', 'model.layers.27.mlp.gate_proj.weight', 'model.layers.27.mlp.up_proj.weight', 'model.layers.27.mlp.down_proj.weight', 'model.layers.27.input_layernorm.weight', 'model.layers.27.post_attention_layernorm.weight', 'model.layers.28.self_attn.q_proj.weight', 'model.layers.28.self_attn.q_proj.bias', 'model.layers.28.self_attn.k_proj.weight', 'model.layers.28.self_attn.k_proj.bias', 'model.layers.28.self_attn.v_proj.weight', 'model.layers.28.self_attn.v_proj.bias', 'model.layers.28.self_attn.o_proj.weight', 'model.layers.28.mlp.gate_proj.weight', 'model.layers.28.mlp.up_proj.weight', 'model.layers.28.mlp.down_proj.weight', 'model.layers.28.input_layernorm.weight', 'model.layers.28.post_attention_layernorm.weight', 'model.layers.29.self_attn.q_proj.weight', 'model.layers.29.self_attn.q_proj.bias', 'model.layers.29.self_attn.k_proj.weight', 'model.layers.29.self_attn.k_proj.bias', 'model.layers.29.self_attn.v_proj.weight', 'model.layers.29.self_attn.v_proj.bias', 'model.layers.29.self_attn.o_proj.weight', 'model.layers.29.mlp.gate_proj.weight', 'model.layers.29.mlp.up_proj.weight', 'model.layers.29.mlp.down_proj.weight', 'model.layers.29.input_layernorm.weight', 'model.layers.29.post_attention_layernorm.weight', 'model.layers.30.self_attn.q_proj.weight', 'model.layers.30.self_attn.q_proj.bias', 'model.layers.30.self_attn.k_proj.weight', 'model.layers.30.self_attn.k_proj.bias', 'model.layers.30.self_attn.v_proj.weight', 'model.layers.30.self_attn.v_proj.bias', 'model.layers.30.self_attn.o_proj.weight', 'model.layers.30.mlp.gate_proj.weight', 'model.layers.30.mlp.up_proj.weight', 'model.layers.30.mlp.down_proj.weight', 'model.layers.30.input_layernorm.weight', 'model.layers.30.post_attention_layernorm.weight', 'model.layers.31.self_attn.q_proj.weight', 'model.layers.31.self_attn.q_proj.bias', 'model.layers.31.self_attn.k_proj.weight', 'model.layers.31.self_attn.k_proj.bias', 'model.layers.31.self_attn.v_proj.weight', 'model.layers.31.self_attn.v_proj.bias', 'model.layers.31.self_attn.o_proj.weight', 'model.layers.31.mlp.gate_proj.weight', 'model.layers.31.mlp.up_proj.weight', 'model.layers.31.mlp.down_proj.weight', 'model.layers.31.input_layernorm.weight', 'model.layers.31.post_attention_layernorm.weight', 'model.layers.32.self_attn.q_proj.weight', 'model.layers.32.self_attn.q_proj.bias', 'model.layers.32.self_attn.k_proj.weight', 'model.layers.32.self_attn.k_proj.bias', 'model.layers.32.self_attn.v_proj.weight', 'model.layers.32.self_attn.v_proj.bias', 'model.layers.32.self_attn.o_proj.weight', 'model.layers.32.mlp.gate_proj.weight', 'model.layers.32.mlp.up_proj.weight', 'model.layers.32.mlp.down_proj.weight', 'model.layers.32.input_layernorm.weight', 'model.layers.32.post_attention_layernorm.weight', 'model.layers.33.self_attn.q_proj.weight', 'model.layers.33.self_attn.q_proj.bias', 'model.layers.33.self_attn.k_proj.weight', 'model.layers.33.self_attn.k_proj.bias', 'model.layers.33.self_attn.v_proj.weight', 'model.layers.33.self_attn.v_proj.bias', 'model.layers.33.self_attn.o_proj.weight', 'model.layers.33.mlp.gate_proj.weight', 'model.layers.33.mlp.up_proj.weight', 'model.layers.33.mlp.down_proj.weight', 'model.layers.33.input_layernorm.weight', 'model.layers.33.post_attention_layernorm.weight', 'model.layers.34.self_attn.q_proj.weight', 'model.layers.34.self_attn.q_proj.bias', 'model.layers.34.self_attn.k_proj.weight', 'model.layers.34.self_attn.k_proj.bias', 'model.layers.34.self_attn.v_proj.weight', 'model.layers.34.self_attn.v_proj.bias', 'model.layers.34.self_attn.o_proj.weight', 'model.layers.34.mlp.gate_proj.weight', 'model.layers.34.mlp.up_proj.weight', 'model.layers.34.mlp.down_proj.weight', 'model.layers.34.input_layernorm.weight', 'model.layers.34.post_attention_layernorm.weight', 'model.layers.35.self_attn.q_proj.weight', 'model.layers.35.self_attn.q_proj.bias', 'model.layers.35.self_attn.k_proj.weight', 'model.layers.35.self_attn.k_proj.bias', 'model.layers.35.self_attn.v_proj.weight', 'model.layers.35.self_attn.v_proj.bias', 'model.layers.35.self_attn.o_proj.weight', 'model.layers.35.mlp.gate_proj.weight', 'model.layers.35.mlp.up_proj.weight', 'model.layers.35.mlp.down_proj.weight', 'model.layers.35.input_layernorm.weight', 'model.layers.35.post_attention_layernorm.weight', 'model.layers.36.self_attn.q_proj.weight', 'model.layers.36.self_attn.q_proj.bias', 'model.layers.36.self_attn.k_proj.weight', 'model.layers.36.self_attn.k_proj.bias', 'model.layers.36.self_attn.v_proj.weight', 'model.layers.36.self_attn.v_proj.bias', 'model.layers.36.self_attn.o_proj.weight', 'model.layers.36.mlp.gate_proj.weight', 'model.layers.36.mlp.up_proj.weight', 'model.layers.36.mlp.down_proj.weight', 'model.layers.36.input_layernorm.weight', 'model.layers.36.post_attention_layernorm.weight', 'model.layers.37.self_attn.q_proj.weight', 'model.layers.37.self_attn.q_proj.bias', 'model.layers.37.self_attn.k_proj.weight', 'model.layers.37.self_attn.k_proj.bias', 'model.layers.37.self_attn.v_proj.weight', 'model.layers.37.self_attn.v_proj.bias', 'model.layers.37.self_attn.o_proj.weight', 'model.layers.37.mlp.gate_proj.weight', 'model.layers.37.mlp.up_proj.weight', 'model.layers.37.mlp.down_proj.weight', 'model.layers.37.input_layernorm.weight', 'model.layers.37.post_attention_layernorm.weight', 'model.layers.38.self_attn.q_proj.weight', 'model.layers.38.self_attn.q_proj.bias', 'model.layers.38.self_attn.k_proj.weight', 'model.layers.38.self_attn.k_proj.bias', 'model.layers.38.self_attn.v_proj.weight', 'model.layers.38.self_attn.v_proj.bias', 'model.layers.38.self_attn.o_proj.weight', 'model.layers.38.mlp.gate_proj.weight', 'model.layers.38.mlp.up_proj.weight', 'model.layers.38.mlp.down_proj.weight', 'model.layers.38.input_layernorm.weight', 'model.layers.38.post_attention_layernorm.weight', 'model.layers.39.self_attn.q_proj.weight', 'model.layers.39.self_attn.q_proj.bias', 'model.layers.39.self_attn.k_proj.weight', 'model.layers.39.self_attn.k_proj.bias', 'model.layers.39.self_attn.v_proj.weight', 'model.layers.39.self_attn.v_proj.bias', 'model.layers.39.self_attn.o_proj.weight', 'model.layers.39.mlp.gate_proj.weight', 'model.layers.39.mlp.up_proj.weight', 'model.layers.39.mlp.down_proj.weight', 'model.layers.39.input_layernorm.weight', 'model.layers.39.post_attention_layernorm.weight', 'model.layers.40.self_attn.q_proj.weight', 'model.layers.40.self_attn.q_proj.bias', 'model.layers.40.self_attn.k_proj.weight', 'model.layers.40.self_attn.k_proj.bias', 'model.layers.40.self_attn.v_proj.weight', 'model.layers.40.self_attn.v_proj.bias', 'model.layers.40.self_attn.o_proj.weight', 'model.layers.40.mlp.gate_proj.weight', 'model.layers.40.mlp.up_proj.weight', 'model.layers.40.mlp.down_proj.weight', 'model.layers.40.input_layernorm.weight', 'model.layers.40.post_attention_layernorm.weight', 'model.layers.41.self_attn.q_proj.weight', 'model.layers.41.self_attn.q_proj.bias', 'model.layers.41.self_attn.k_proj.weight', 'model.layers.41.self_attn.k_proj.bias', 'model.layers.41.self_attn.v_proj.weight', 'model.layers.41.self_attn.v_proj.bias', 'model.layers.41.self_attn.o_proj.weight', 'model.layers.41.mlp.gate_proj.weight', 'model.layers.41.mlp.up_proj.weight', 'model.layers.41.mlp.down_proj.weight', 'model.layers.41.input_layernorm.weight', 'model.layers.41.post_attention_layernorm.weight', 'model.layers.42.self_attn.q_proj.weight', 'model.layers.42.self_attn.q_proj.bias', 'model.layers.42.self_attn.k_proj.weight', 'model.layers.42.self_attn.k_proj.bias', 'model.layers.42.self_attn.v_proj.weight', 'model.layers.42.self_attn.v_proj.bias', 'model.layers.42.self_attn.o_proj.weight', 'model.layers.42.mlp.gate_proj.weight', 'model.layers.42.mlp.up_proj.weight', 'model.layers.42.mlp.down_proj.weight', 'model.layers.42.input_layernorm.weight', 'model.layers.42.post_attention_layernorm.weight', 'model.layers.43.self_attn.q_proj.weight', 'model.layers.43.self_attn.q_proj.bias', 'model.layers.43.self_attn.k_proj.weight', 'model.layers.43.self_attn.k_proj.bias', 'model.layers.43.self_attn.v_proj.weight', 'model.layers.43.self_attn.v_proj.bias', 'model.layers.43.self_attn.o_proj.weight', 'model.layers.43.mlp.gate_proj.weight', 'model.layers.43.mlp.up_proj.weight', 'model.layers.43.mlp.down_proj.weight', 'model.layers.43.input_layernorm.weight', 'model.layers.43.post_attention_layernorm.weight', 'model.layers.44.self_attn.q_proj.weight', 'model.layers.44.self_attn.q_proj.bias', 'model.layers.44.self_attn.k_proj.weight', 'model.layers.44.self_attn.k_proj.bias', 'model.layers.44.self_attn.v_proj.weight', 'model.layers.44.self_attn.v_proj.bias', 'model.layers.44.self_attn.o_proj.weight', 'model.layers.44.mlp.gate_proj.weight', 'model.layers.44.mlp.up_proj.weight', 'model.layers.44.mlp.down_proj.weight', 'model.layers.44.input_layernorm.weight', 'model.layers.44.post_attention_layernorm.weight', 'model.layers.45.self_attn.q_proj.weight', 'model.layers.45.self_attn.q_proj.bias', 'model.layers.45.self_attn.k_proj.weight', 'model.layers.45.self_attn.k_proj.bias', 'model.layers.45.self_attn.v_proj.weight', 'model.layers.45.self_attn.v_proj.bias', 'model.layers.45.self_attn.o_proj.weight', 'model.layers.45.mlp.gate_proj.weight', 'model.layers.45.mlp.up_proj.weight', 'model.layers.45.mlp.down_proj.weight', 'model.layers.45.input_layernorm.weight', 'model.layers.45.post_attention_layernorm.weight', 'model.layers.46.self_attn.q_proj.weight', 'model.layers.46.self_attn.q_proj.bias', 'model.layers.46.self_attn.k_proj.weight', 'model.layers.46.self_attn.k_proj.bias', 'model.layers.46.self_attn.v_proj.weight', 'model.layers.46.self_attn.v_proj.bias', 'model.layers.46.self_attn.o_proj.weight', 'model.layers.46.mlp.gate_proj.weight', 'model.layers.46.mlp.up_proj.weight', 'model.layers.46.mlp.down_proj.weight', 'model.layers.46.input_layernorm.weight', 'model.layers.46.post_attention_layernorm.weight', 'model.layers.47.self_attn.q_proj.weight', 'model.layers.47.self_attn.q_proj.bias', 'model.layers.47.self_attn.k_proj.weight', 'model.layers.47.self_attn.k_proj.bias', 'model.layers.47.self_attn.v_proj.weight', 'model.layers.47.self_attn.v_proj.bias', 'model.layers.47.self_attn.o_proj.weight', 'model.layers.47.mlp.gate_proj.weight', 'model.layers.47.mlp.up_proj.weight', 'model.layers.47.mlp.down_proj.weight', 'model.layers.47.input_layernorm.weight', 'model.layers.47.post_attention_layernorm.weight', 'model.layers.48.self_attn.q_proj.weight', 'model.layers.48.self_attn.q_proj.bias', 'model.layers.48.self_attn.k_proj.weight', 'model.layers.48.self_attn.k_proj.bias', 'model.layers.48.self_attn.v_proj.weight', 'model.layers.48.self_attn.v_proj.bias', 'model.layers.48.self_attn.o_proj.weight', 'model.layers.48.mlp.gate_proj.weight', 'model.layers.48.mlp.up_proj.weight', 'model.layers.48.mlp.down_proj.weight', 'model.layers.48.input_layernorm.weight', 'model.layers.48.post_attention_layernorm.weight', 'model.layers.49.self_attn.q_proj.weight', 'model.layers.49.self_attn.q_proj.bias', 'model.layers.49.self_attn.k_proj.weight', 'model.layers.49.self_attn.k_proj.bias', 'model.layers.49.self_attn.v_proj.weight', 'model.layers.49.self_attn.v_proj.bias', 'model.layers.49.self_attn.o_proj.weight', 'model.layers.49.mlp.gate_proj.weight', 'model.layers.49.mlp.up_proj.weight', 'model.layers.49.mlp.down_proj.weight', 'model.layers.49.input_layernorm.weight', 'model.layers.49.post_attention_layernorm.weight', 'model.layers.50.self_attn.q_proj.weight', 'model.layers.50.self_attn.q_proj.bias', 'model.layers.50.self_attn.k_proj.weight', 'model.layers.50.self_attn.k_proj.bias', 'model.layers.50.self_attn.v_proj.weight', 'model.layers.50.self_attn.v_proj.bias', 'model.layers.50.self_attn.o_proj.weight', 'model.layers.50.mlp.gate_proj.weight', 'model.layers.50.mlp.up_proj.weight', 'model.layers.50.mlp.down_proj.weight', 'model.layers.50.input_layernorm.weight', 'model.layers.50.post_attention_layernorm.weight', 'model.layers.51.self_attn.q_proj.weight', 'model.layers.51.self_attn.q_proj.bias', 'model.layers.51.self_attn.k_proj.weight', 'model.layers.51.self_attn.k_proj.bias', 'model.layers.51.self_attn.v_proj.weight', 'model.layers.51.self_attn.v_proj.bias', 'model.layers.51.self_attn.o_proj.weight', 'model.layers.51.mlp.gate_proj.weight', 'model.layers.51.mlp.up_proj.weight', 'model.layers.51.mlp.down_proj.weight', 'model.layers.51.input_layernorm.weight', 'model.layers.51.post_attention_layernorm.weight', 'model.layers.52.self_attn.q_proj.weight', 'model.layers.52.self_attn.q_proj.bias', 'model.layers.52.self_attn.k_proj.weight', 'model.layers.52.self_attn.k_proj.bias', 'model.layers.52.self_attn.v_proj.weight', 'model.layers.52.self_attn.v_proj.bias', 'model.layers.52.self_attn.o_proj.weight', 'model.layers.52.mlp.gate_proj.weight', 'model.layers.52.mlp.up_proj.weight', 'model.layers.52.mlp.down_proj.weight', 'model.layers.52.input_layernorm.weight', 'model.layers.52.post_attention_layernorm.weight', 'model.layers.53.self_attn.q_proj.weight', 'model.layers.53.self_attn.q_proj.bias', 'model.layers.53.self_attn.k_proj.weight', 'model.layers.53.self_attn.k_proj.bias', 'model.layers.53.self_attn.v_proj.weight', 'model.layers.53.self_attn.v_proj.bias', 'model.layers.53.self_attn.o_proj.weight', 'model.layers.53.mlp.gate_proj.weight', 'model.layers.53.mlp.up_proj.weight', 'model.layers.53.mlp.down_proj.weight', 'model.layers.53.input_layernorm.weight', 'model.layers.53.post_attention_layernorm.weight', 'model.layers.54.self_attn.q_proj.weight', 'model.layers.54.self_attn.q_proj.bias', 'model.layers.54.self_attn.k_proj.weight', 'model.layers.54.self_attn.k_proj.bias', 'model.layers.54.self_attn.v_proj.weight', 'model.layers.54.self_attn.v_proj.bias', 'model.layers.54.self_attn.o_proj.weight', 'model.layers.54.mlp.gate_proj.weight', 'model.layers.54.mlp.up_proj.weight', 'model.layers.54.mlp.down_proj.weight', 'model.layers.54.input_layernorm.weight', 'model.layers.54.post_attention_layernorm.weight', 'model.layers.55.self_attn.q_proj.weight', 'model.layers.55.self_attn.q_proj.bias', 'model.layers.55.self_attn.k_proj.weight', 'model.layers.55.self_attn.k_proj.bias', 'model.layers.55.self_attn.v_proj.weight', 'model.layers.55.self_attn.v_proj.bias', 'model.layers.55.self_attn.o_proj.weight', 'model.layers.55.mlp.gate_proj.weight', 'model.layers.55.mlp.up_proj.weight', 'model.layers.55.mlp.down_proj.weight', 'model.layers.55.input_layernorm.weight', 'model.layers.55.post_attention_layernorm.weight', 'model.layers.56.self_attn.q_proj.weight', 'model.layers.56.self_attn.q_proj.bias', 'model.layers.56.self_attn.k_proj.weight', 'model.layers.56.self_attn.k_proj.bias', 'model.layers.56.self_attn.v_proj.weight', 'model.layers.56.self_attn.v_proj.bias', 'model.layers.56.self_attn.o_proj.weight', 'model.layers.56.mlp.gate_proj.weight', 'model.layers.56.mlp.up_proj.weight', 'model.layers.56.mlp.down_proj.weight', 'model.layers.56.input_layernorm.weight', 'model.layers.56.post_attention_layernorm.weight', 'model.layers.57.self_attn.q_proj.weight', 'model.layers.57.self_attn.q_proj.bias', 'model.layers.57.self_attn.k_proj.weight', 'model.layers.57.self_attn.k_proj.bias', 'model.layers.57.self_attn.v_proj.weight', 'model.layers.57.self_attn.v_proj.bias', 'model.layers.57.self_attn.o_proj.weight', 'model.layers.57.mlp.gate_proj.weight', 'model.layers.57.mlp.up_proj.weight', 'model.layers.57.mlp.down_proj.weight', 'model.layers.57.input_layernorm.weight', 'model.layers.57.post_attention_layernorm.weight', 'model.layers.58.self_attn.q_proj.weight', 'model.layers.58.self_attn.q_proj.bias', 'model.layers.58.self_attn.k_proj.weight', 'model.layers.58.self_attn.k_proj.bias', 'model.layers.58.self_attn.v_proj.weight', 'model.layers.58.self_attn.v_proj.bias', 'model.layers.58.self_attn.o_proj.weight', 'model.layers.58.mlp.gate_proj.weight', 'model.layers.58.mlp.up_proj.weight', 'model.layers.58.mlp.down_proj.weight', 'model.layers.58.input_layernorm.weight', 'model.layers.58.post_attention_layernorm.weight', 'model.layers.59.self_attn.q_proj.weight', 'model.layers.59.self_attn.q_proj.bias', 'model.layers.59.self_attn.k_proj.weight', 'model.layers.59.self_attn.k_proj.bias', 'model.layers.59.self_attn.v_proj.weight', 'model.layers.59.self_attn.v_proj.bias', 'model.layers.59.self_attn.o_proj.weight', 'model.layers.59.mlp.gate_proj.weight', 'model.layers.59.mlp.up_proj.weight', 'model.layers.59.mlp.down_proj.weight', 'model.layers.59.input_layernorm.weight', 'model.layers.59.post_attention_layernorm.weight', 'model.layers.60.self_attn.q_proj.weight', 'model.layers.60.self_attn.q_proj.bias', 'model.layers.60.self_attn.k_proj.weight', 'model.layers.60.self_attn.k_proj.bias', 'model.layers.60.self_attn.v_proj.weight', 'model.layers.60.self_attn.v_proj.bias', 'model.layers.60.self_attn.o_proj.weight', 'model.layers.60.mlp.gate_proj.weight', 'model.layers.60.mlp.up_proj.weight', 'model.layers.60.mlp.down_proj.weight', 'model.layers.60.input_layernorm.weight', 'model.layers.60.post_attention_layernorm.weight', 'model.layers.61.self_attn.q_proj.weight', 'model.layers.61.self_attn.q_proj.bias', 'model.layers.61.self_attn.k_proj.weight', 'model.layers.61.self_attn.k_proj.bias', 'model.layers.61.self_attn.v_proj.weight', 'model.layers.61.self_attn.v_proj.bias', 'model.layers.61.self_attn.o_proj.weight', 'model.layers.61.mlp.gate_proj.weight', 'model.layers.61.mlp.up_proj.weight', 'model.layers.61.mlp.down_proj.weight', 'model.layers.61.input_layernorm.weight', 'model.layers.61.post_attention_layernorm.weight', 'model.layers.62.self_attn.q_proj.weight', 'model.layers.62.self_attn.q_proj.bias', 'model.layers.62.self_attn.k_proj.weight', 'model.layers.62.self_attn.k_proj.bias', 'model.layers.62.self_attn.v_proj.weight', 'model.layers.62.self_attn.v_proj.bias', 'model.layers.62.self_attn.o_proj.weight', 'model.layers.62.mlp.gate_proj.weight', 'model.layers.62.mlp.up_proj.weight', 'model.layers.62.mlp.down_proj.weight', 'model.layers.62.input_layernorm.weight', 'model.layers.62.post_attention_layernorm.weight', 'model.layers.63.self_attn.q_proj.weight', 'model.layers.63.self_attn.q_proj.bias', 'model.layers.63.self_attn.k_proj.weight', 'model.layers.63.self_attn.k_proj.bias', 'model.layers.63.self_attn.v_proj.weight', 'model.layers.63.self_attn.v_proj.bias', 'model.layers.63.self_attn.o_proj.weight', 'model.layers.63.mlp.gate_proj.weight', 'model.layers.63.mlp.up_proj.weight', 'model.layers.63.mlp.down_proj.weight', 'model.layers.63.input_layernorm.weight', 'model.layers.63.post_attention_layernorm.weight', 'model.norm.weight', 'model.vision_tower.vision_tower.vision_model.embeddings.class_embedding', 'model.vision_tower.vision_tower.vision_model.embeddings.patch_embedding.weight', 'model.vision_tower.vision_tower.vision_model.embeddings.position_embedding.weight', 'model.vision_tower.vision_tower.vision_model.pre_layrnorm.weight', 'model.vision_tower.vision_tower.vision_model.pre_layrnorm.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.1.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.1.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.1.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.1.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.1.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.1.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.1.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.1.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.2.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.2.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.2.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.2.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.2.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.2.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.2.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.2.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.3.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.3.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.3.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.3.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.3.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.3.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.3.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.3.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.4.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.4.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.4.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.4.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.4.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.4.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.4.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.4.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.5.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.5.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.5.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.5.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.5.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.5.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.5.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.5.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.6.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.6.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.6.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.6.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.6.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.6.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.6.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.6.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.7.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.7.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.7.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.7.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.7.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.7.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.7.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.7.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.8.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.8.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.8.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.8.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.8.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.8.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.8.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.8.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.9.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.9.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.9.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.9.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.9.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.9.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.9.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.9.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.10.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.10.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.10.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.10.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.10.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.10.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.10.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.10.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.11.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.11.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.11.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.11.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.11.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.11.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.11.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.11.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.12.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.12.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.12.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.12.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.12.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.12.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.12.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.12.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.13.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.13.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.13.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.13.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.13.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.13.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.13.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.13.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.14.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.14.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.14.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.14.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.14.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.14.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.14.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.14.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.15.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.15.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.15.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.15.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.15.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.15.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.15.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.15.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.16.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.16.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.16.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.16.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.16.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.16.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.16.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.16.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.17.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.17.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.17.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.17.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.17.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.17.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.17.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.17.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.18.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.18.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.18.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.18.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.18.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.18.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.18.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.18.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.19.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.19.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.19.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.19.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.19.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.19.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.19.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.19.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.20.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.20.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.20.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.20.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.20.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.20.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.20.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.20.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.21.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.21.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.21.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.21.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.21.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.21.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.21.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.21.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.22.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.22.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.22.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.22.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.22.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.22.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.22.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.22.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.23.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.23.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.23.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.23.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.23.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.23.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.23.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.23.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.post_layernorm.weight', 'model.vision_tower.vision_tower.vision_model.post_layernorm.bias', 'model.mm_projector.0.weight', 'model.mm_projector.0.bias', 'model.mm_projector.2.weight', 'model.mm_projector.2.bias', 'lm_head.weight'] +[INFO|trainer.py:571] 2025-01-23 00:22:03,207 >> Using auto half precision backend +[2025-01-23 00:22:14,689] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.14.4, git-hash=unknown, git-branch=unknown +[2025-01-23 00:22:14,736] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False +[2025-01-23 00:22:14,738] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer +[2025-01-23 00:22:14,739] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer +[2025-01-23 00:22:14,826] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW +[2025-01-23 00:22:14,826] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type= +[2025-01-23 00:22:14,826] [INFO] [logging.py:96:log_dist] [Rank 0] Creating fp16 ZeRO stage 3 optimizer, MiCS is enabled False, Hierarchical params gather False +[2025-01-23 00:22:14,826] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 3 optimizer +[2025-01-23 00:22:17,747] [INFO] [utils.py:781:see_memory_usage] Stage 3 initialize beginning +[2025-01-23 00:22:17,747] [INFO] [utils.py:782:see_memory_usage] MA 1.96 GB Max_MA 5.81 GB CA 8.19 GB Max_CA 42 GB +[2025-01-23 00:22:17,747] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 66.01 GB, percent = 8.3% +[2025-01-23 00:22:17,751] [INFO] [stage3.py:130:__init__] Reduce bucket size 26214400 +[2025-01-23 00:22:17,751] [INFO] [stage3.py:131:__init__] Prefetch bucket size 23592960 +[2025-01-23 00:22:20,660] [INFO] [utils.py:781:see_memory_usage] DeepSpeedZeRoOffload initialize [begin] +[2025-01-23 00:22:20,660] [INFO] [utils.py:782:see_memory_usage] MA 1.96 GB Max_MA 1.96 GB CA 8.19 GB Max_CA 8 GB +[2025-01-23 00:22:20,661] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 66.01 GB, percent = 8.3% +Parameter Offload: Total persistent parameters: 1459200 in 569 params +[2025-01-23 00:22:23,580] [INFO] [utils.py:781:see_memory_usage] DeepSpeedZeRoOffload initialize [end] +[2025-01-23 00:22:23,580] [INFO] [utils.py:782:see_memory_usage] MA 1.96 GB Max_MA 2.01 GB CA 8.19 GB Max_CA 8 GB +[2025-01-23 00:22:23,581] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 65.66 GB, percent = 8.2% +[2025-01-23 00:22:26,462] [INFO] [utils.py:781:see_memory_usage] Before creating fp16 partitions +[2025-01-23 00:22:26,462] [INFO] [utils.py:782:see_memory_usage] MA 1.96 GB Max_MA 1.96 GB CA 8.19 GB Max_CA 8 GB +[2025-01-23 00:22:26,462] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 65.66 GB, percent = 8.2% +[2025-01-23 00:22:33,203] [INFO] [utils.py:781:see_memory_usage] After creating fp16 partitions: 4 +[2025-01-23 00:22:33,204] [INFO] [utils.py:782:see_memory_usage] MA 1.96 GB Max_MA 1.96 GB CA 8.69 GB Max_CA 9 GB +[2025-01-23 00:22:33,204] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 65.66 GB, percent = 8.2% +[2025-01-23 00:22:36,095] [INFO] [utils.py:781:see_memory_usage] Before creating fp32 partitions +[2025-01-23 00:22:36,095] [INFO] [utils.py:782:see_memory_usage] MA 1.96 GB Max_MA 1.96 GB CA 8.69 GB Max_CA 9 GB +[2025-01-23 00:22:36,096] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 65.66 GB, percent = 8.2% +[2025-01-23 00:22:39,023] [INFO] [utils.py:781:see_memory_usage] After creating fp32 partitions +[2025-01-23 00:22:39,023] [INFO] [utils.py:782:see_memory_usage] MA 3.89 GB Max_MA 4.83 GB CA 11.55 GB Max_CA 12 GB +[2025-01-23 00:22:39,023] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 65.66 GB, percent = 8.2% +[2025-01-23 00:22:41,943] [INFO] [utils.py:781:see_memory_usage] Before initializing optimizer states +[2025-01-23 00:22:41,944] [INFO] [utils.py:782:see_memory_usage] MA 3.89 GB Max_MA 3.89 GB CA 11.55 GB Max_CA 12 GB +[2025-01-23 00:22:41,944] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 65.66 GB, percent = 8.2% +[2025-01-23 00:22:44,871] [INFO] [utils.py:781:see_memory_usage] After initializing optimizer states +[2025-01-23 00:22:44,872] [INFO] [utils.py:782:see_memory_usage] MA 3.89 GB Max_MA 5.8 GB CA 13.46 GB Max_CA 13 GB +[2025-01-23 00:22:44,872] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 65.66 GB, percent = 8.2% +[2025-01-23 00:22:44,872] [INFO] [stage3.py:486:_setup_for_real_optimizer] optimizer state initialized +[2025-01-23 00:22:50,301] [INFO] [utils.py:781:see_memory_usage] After initializing ZeRO optimizer +[2025-01-23 00:22:50,302] [INFO] [utils.py:782:see_memory_usage] MA 4.9 GB Max_MA 7.8 GB CA 16.37 GB Max_CA 16 GB +[2025-01-23 00:22:50,302] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 65.67 GB, percent = 8.2% +[2025-01-23 00:22:50,302] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = DeepSpeedZeroOptimizer_Stage3 +[2025-01-23 00:22:50,302] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler +[2025-01-23 00:22:50,302] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = None +[2025-01-23 00:22:50,303] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0, 0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] +[2025-01-23 00:22:50,304] [INFO] [config.py:997:print] DeepSpeedEngine configuration: +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] activation_checkpointing_config { + "partition_activations": false, + "contiguous_memory_optimization": false, + "cpu_checkpointing": false, + "number_checkpoints": null, + "synchronize_checkpoint_boundary": false, + "profile": false +} +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] amp_enabled .................. False +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] amp_params ................... False +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] autotuning_config ............ { + "enabled": false, + "start_step": null, + "end_step": null, + "metric_path": null, + "arg_mappings": null, + "metric": "throughput", + "model_info": null, + "results_dir": "autotuning_results", + "exps_dir": "autotuning_exps", + "overwrite": true, + "fast": true, + "start_profile_step": 3, + "end_profile_step": 5, + "tuner_type": "gridsearch", + "tuner_early_stopping": 5, + "tuner_num_trials": 50, + "model_info_path": null, + "mp_size": 1, + "max_train_batch_size": null, + "min_train_batch_size": 1, + "max_train_micro_batch_size_per_gpu": 1.024000e+03, + "min_train_micro_batch_size_per_gpu": 1, + "num_tuning_micro_batch_sizes": 3 +} +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] bfloat16_enabled ............. True +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] bfloat16_immediate_grad_update False +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] checkpoint_parallel_write_pipeline False +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] checkpoint_tag_validation_enabled True +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] checkpoint_tag_validation_fail False +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] comms_config ................. +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] communication_data_type ...... None +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] curriculum_enabled_legacy .... False +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] curriculum_params_legacy ..... False +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] data_efficiency_enabled ...... False +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] dataloader_drop_last ......... False +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] disable_allgather ............ False +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] dump_state ................... False +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] dynamic_loss_scale_args ...... None +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] eigenvalue_enabled ........... False +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] eigenvalue_gas_boundary_resolution 1 +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] eigenvalue_layer_name ........ bert.encoder.layer +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] eigenvalue_layer_num ......... 0 +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] eigenvalue_max_iter .......... 100 +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] eigenvalue_stability ......... 1e-06 +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] eigenvalue_tol ............... 0.01 +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] eigenvalue_verbose ........... False +[2025-01-23 00:22:50,305] [INFO] [config.py:1001:print] elasticity_enabled ........... False +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] flops_profiler_config ........ { + "enabled": false, + "recompute_fwd_factor": 0.0, + "profile_step": 1, + "module_depth": -1, + "top_modules": 1, + "detailed": true, + "output_file": null +} +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] fp16_auto_cast ............... None +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] fp16_enabled ................. False +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] fp16_master_weights_and_gradients False +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] global_rank .................. 0 +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] grad_accum_dtype ............. None +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] gradient_accumulation_steps .. 2 +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] gradient_clipping ............ 0.0 +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] gradient_predivide_factor .... 1.0 +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] graph_harvesting ............. False +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] initial_dynamic_scale ........ 1 +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] load_universal_checkpoint .... False +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] loss_scale ................... 1.0 +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] memory_breakdown ............. False +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] mics_hierarchial_params_gather False +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] mics_shard_size .............. -1 +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') comet=CometConfig(enabled=False, samples_log_interval=100, project=None, workspace=None, api_key=None, experiment_name=None, experiment_key=None, online=None, mode=None) wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] nebula_config ................ { + "enabled": false, + "persistent_storage_path": null, + "persistent_time_interval": 100, + "num_of_version_in_retention": 2, + "enable_nebula_load": true, + "load_path": null +} +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] optimizer_legacy_fusion ...... False +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] optimizer_name ............... None +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] optimizer_params ............. None +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True} +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] pld_enabled .................. False +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] pld_params ................... False +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] prescale_gradients ........... False +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] scheduler_name ............... None +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] scheduler_params ............. None +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] seq_parallel_communication_data_type torch.float32 +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] sparse_attention ............. None +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] sparse_gradients_enabled ..... False +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] steps_per_print .............. inf +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] timers_config ................ enabled=True synchronized=True +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] train_batch_size ............. 128 +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] train_micro_batch_size_per_gpu 1 +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] use_data_before_expert_parallel_ False +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] use_node_local_storage ....... False +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] wall_clock_breakdown ......... False +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] weight_quantization_config ... None +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] world_size ................... 64 +[2025-01-23 00:22:50,306] [INFO] [config.py:1001:print] zero_allow_untested_optimizer True +[2025-01-23 00:22:50,307] [INFO] [config.py:1001:print] zero_config .................. stage=3 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=26214400 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=23592960 param_persistence_threshold=51200 model_persistence_threshold=sys.maxsize max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=True use_all_reduce_for_fetch_params=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True +[2025-01-23 00:22:50,307] [INFO] [config.py:1001:print] zero_enabled ................. True +[2025-01-23 00:22:50,307] [INFO] [config.py:1001:print] zero_force_ds_cpu_optimizer .. True +[2025-01-23 00:22:50,307] [INFO] [config.py:1001:print] zero_optimization_stage ...... 3 +[2025-01-23 00:22:50,307] [INFO] [config.py:987:print_user_config] json = { + "fp16": { + "enabled": false, + "loss_scale": 0, + "loss_scale_window": 1000, + "initial_scale_power": 16, + "hysteresis": 2, + "min_loss_scale": 1 + }, + "bf16": { + "enabled": true + }, + "train_micro_batch_size_per_gpu": 1, + "train_batch_size": 128, + "gradient_accumulation_steps": 2, + "zero_optimization": { + "stage": 3, + "overlap_comm": true, + "contiguous_gradients": true, + "sub_group_size": 1.000000e+09, + "reduce_bucket_size": 2.621440e+07, + "stage3_prefetch_bucket_size": 2.359296e+07, + "stage3_param_persistence_threshold": 5.120000e+04, + "stage3_max_live_parameters": 1.000000e+09, + "stage3_max_reuse_distance": 1.000000e+09, + "stage3_gather_16bit_weights_on_model_save": true + }, + "steps_per_print": inf, + "zero_allow_untested_optimizer": true +} +[INFO|trainer.py:1721] 2025-01-23 00:22:50,307 >> ***** Running training ***** +[INFO|trainer.py:1722] 2025-01-23 00:22:50,307 >> Num examples = 944,379 +[INFO|trainer.py:1723] 2025-01-23 00:22:50,307 >> Num Epochs = 1 +[INFO|trainer.py:1724] 2025-01-23 00:22:50,307 >> Instantaneous batch size per device = 1 +[INFO|trainer.py:1727] 2025-01-23 00:22:50,307 >> Total train batch size (w. parallel, distributed & accumulation) = 128 +[INFO|trainer.py:1728] 2025-01-23 00:22:50,307 >> Gradient Accumulation steps = 2 +[INFO|trainer.py:1729] 2025-01-23 00:22:50,307 >> Total optimization steps = 7,378 +[INFO|trainer.py:1730] 2025-01-23 00:22:50,310 >> Number of trainable parameters = 33,098,856,448 +[INFO|integration_utils.py:722] 2025-01-23 00:22:50,313 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" +wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information. +wandb: Currently logged in as: z2855064151 (openmmlab_zxy). Use `wandb login --relogin` to force relogin +wandb: - Waiting for wandb.init()... +wandb: \ Waiting for wandb.init()... +wandb: Tracking run with wandb version 0.18.5 +wandb: Run data is saved locally in /cpfs02/user/zhaoxiangyu/code_new/LLaVA/wandb/run-20250123_002254-x4s5p1ck +wandb: Run `wandb offline` to turn off syncing. +wandb: Syncing run llavaAR4-qwen2_5-32b-sft-llavanext-notext-kn-infpolishmd-detail-knins40k-creationme10kfixed-chart11kmerge-tqa8k-info28kgpt +wandb: ⭐️ View project at https://wandb.ai/openmmlab_zxy/huggingface +wandb: 🚀 View run at https://wandb.ai/openmmlab_zxy/huggingface/runs/x4s5p1ck + + 0%| | 0/7378 [00:007->6 [1] 0/-1/-1->7->6 [2] 0/-1/-1->7->6 [3] 0/-1/-1->7->6 [4] -1/-1/-1->7->6 [5] 0/-1/-1->7->6 [6] 0/-1/-1->7->6 [7] 0/-1/-1->7->6 +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/38/-1->6->-1 [4] 7/-1/-1->6->5 [5] 7/-1/-1->6->5 [6] 7/-1/-1->6->5 [7] 7/-1/-1->6->14 +dlc1irjyfb0zt5ew-master-0:81:3214 [7] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-master-0:79:3217 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] -1/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] -1/-1/-1->5->4 +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-master-0:79:3217 [5] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/36/-1->4->-1 [3] 5/-1/-1->4->3 [4] 5/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->12 [7] 5/-1/-1->4->3 +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 00/08 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9 16 23 22 21 +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/34/-1->2->-1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->10 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 01/08 : 0 3 10 15 14 13 12 9 8 11 18 23 22 21 20 17 16 19 26 31 +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-master-0:75:3215 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] -1/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 +dlc1irjyfb0zt5ew-master-0:75:3215 [1] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 02/08 : 0 7 6 5 12 11 10 9 8 15 14 13 20 19 18 17 16 23 22 21 +dlc1irjyfb0zt5ew-master-0:77:3213 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] -1/-1/-1->3->2 [3] 4/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] -1/-1/-1->3->2 [7] 4/-1/-1->3->2 +dlc1irjyfb0zt5ew-master-0:77:3213 [3] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 03/08 : 0 5 4 7 14 11 10 9 8 13 12 15 22 19 18 17 16 21 20 23 +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 04/08 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9 16 23 22 21 +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 05/08 : 0 3 10 15 14 13 12 9 8 11 18 23 22 21 20 17 16 19 26 31 +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 06/08 : 0 7 6 5 12 11 10 9 8 15 14 13 20 19 18 17 16 23 22 21 +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 07/08 : 0 5 4 7 14 11 10 9 8 13 12 15 22 19 18 17 16 21 20 23 +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Trees [0] 1/32/-1->0->-1 [1] 1/-1/-1->0->7 [2] 1/-1/-1->0->7 [3] 1/-1/-1->0->7 [4] 1/-1/-1->0->8 [5] 1/-1/-1->0->7 [6] 1/-1/-1->0->7 [7] 1/-1/-1->0->7 +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Channel 03/0 : 4[4] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO Channel 01/0 : 2[2] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 01/0 : 0[0] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 05/0 : 0[0] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Channel 07/0 : 4[4] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO Channel 05/0 : 2[2] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO Channel 03/0 : 63[7] -> 6[6] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO Channel 07/0 : 63[7] -> 6[6] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-master-0:75:3215 [1] NCCL INFO Channel 00/0 : 1[1] -> 8[0] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-master-0:75:3215 [1] NCCL INFO Channel 04/0 : 1[1] -> 8[0] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 03/0 : 0[0] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 07/0 : 0[0] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:77:3213 [3] NCCL INFO Channel 01/0 : 3[3] -> 10[2] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-master-0:77:3213 [3] NCCL INFO Channel 05/0 : 3[3] -> 10[2] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Channel 02/0 : 61[5] -> 4[4] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Channel 06/0 : 61[5] -> 4[4] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-master-0:79:3217 [5] NCCL INFO Channel 02/0 : 5[5] -> 12[4] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-master-0:79:3217 [5] NCCL INFO Channel 06/0 : 5[5] -> 12[4] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 00/0 : 57[1] -> 0[0] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 04/0 : 57[1] -> 0[0] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 00/0 : 0[0] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO Channel 01/0 : 59[3] -> 2[2] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO Channel 05/0 : 59[3] -> 2[2] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 02/0 : 0[0] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 04/0 : 0[0] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:81:3214 [7] NCCL INFO Channel 03/0 : 7[7] -> 14[6] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-master-0:81:3214 [7] NCCL INFO Channel 07/0 : 7[7] -> 14[6] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 06/0 : 0[0] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:81:3214 [7] NCCL INFO Channel 00/0 : 7[7] -> 6[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Channel 01/0 : 4[4] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:81:3214 [7] NCCL INFO Channel 01/0 : 7[7] -> 6[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Channel 05/0 : 4[4] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:81:3214 [7] NCCL INFO Channel 02/0 : 7[7] -> 6[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO Channel 03/0 : 6[6] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:81:3214 [7] NCCL INFO Channel 04/0 : 7[7] -> 6[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO Channel 07/0 : 6[6] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:81:3214 [7] NCCL INFO Channel 05/0 : 7[7] -> 6[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:81:3214 [7] NCCL INFO Channel 06/0 : 7[7] -> 6[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO Channel 03/0 : 2[2] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:79:3217 [5] NCCL INFO Channel 00/0 : 5[5] -> 4[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:79:3217 [5] NCCL INFO Channel 01/0 : 5[5] -> 4[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:79:3217 [5] NCCL INFO Channel 03/0 : 5[5] -> 4[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:77:3213 [3] NCCL INFO Channel 00/0 : 3[3] -> 2[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:79:3217 [5] NCCL INFO Channel 04/0 : 5[5] -> 4[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:77:3213 [3] NCCL INFO Channel 02/0 : 3[3] -> 2[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:79:3217 [5] NCCL INFO Channel 05/0 : 5[5] -> 4[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:77:3213 [3] NCCL INFO Channel 03/0 : 3[3] -> 2[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO Channel 00/0 : 6[6] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:79:3217 [5] NCCL INFO Channel 07/0 : 5[5] -> 4[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:77:3213 [3] NCCL INFO Channel 04/0 : 3[3] -> 2[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:77:3213 [3] NCCL INFO Channel 06/0 : 3[3] -> 2[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO Channel 01/0 : 6[6] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:77:3213 [3] NCCL INFO Channel 07/0 : 3[3] -> 2[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO Channel 02/0 : 6[6] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO Channel 04/0 : 6[6] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO Channel 05/0 : 6[6] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Channel 00/0 : 4[4] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:3215 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO Channel 06/0 : 6[6] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:3215 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Channel 02/0 : 4[4] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:3215 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Channel 04/0 : 4[4] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:3215 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:3215 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Channel 06/0 : 4[4] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:3215 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:79:3217 [5] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-master-0:81:3214 [7] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-master-0:77:3213 [3] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-master-0:79:3217 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:3215 [1] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-master-0:79:3217 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-master-0:79:3217 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-master-0:79:3217 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:79:3217 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:77:3213 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:79:3217 [5] NCCL INFO Channel 06/0 : 5[5] -> 6[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:77:3213 [3] NCCL INFO Channel 01/0 : 3[3] -> 4[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:77:3213 [3] NCCL INFO Channel 03/0 : 3[3] -> 4[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:77:3213 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:77:3213 [3] NCCL INFO Channel 05/0 : 3[3] -> 4[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:3215 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Channel 00/0 : 4[4] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:77:3213 [3] NCCL INFO Channel 07/0 : 3[3] -> 4[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:3215 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Channel 01/0 : 4[4] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO Channel 03/0 : 2[2] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO Channel 00/0 : 6[6] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:3215 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Channel 02/0 : 4[4] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO Channel 01/0 : 6[6] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:3215 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Channel 03/0 : 4[4] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO Channel 02/0 : 6[6] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:3215 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Channel 04/0 : 4[4] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO Channel 03/0 : 6[6] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:3215 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Channel 05/0 : 4[4] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO Channel 04/0 : 6[6] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 01/0 : 0[0] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Channel 06/0 : 4[4] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:3215 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO Channel 05/0 : 2[2] -> 10[2] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO Channel 05/0 : 6[6] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 03/0 : 0[0] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Channel 07/0 : 4[4] -> 5[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:75:3215 [1] NCCL INFO Channel 04/0 : 1[1] -> 0[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO Channel 06/0 : 6[6] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 05/0 : 0[0] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Channel 06/0 : 4[4] -> 12[4] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO Channel 07/0 : 6[6] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 07/0 : 0[0] -> 7[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:79:3217 [5] NCCL INFO Channel 02/0 : 5[5] -> 4[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO Channel 07/0 : 6[6] -> 14[6] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-master-0:79:3217 [5] NCCL INFO Channel 06/0 : 5[5] -> 4[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:77:3213 [3] NCCL INFO Channel 01/0 : 3[3] -> 2[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:77:3213 [3] NCCL INFO Channel 05/0 : 3[3] -> 2[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:81:3214 [7] NCCL INFO Channel 01/0 : 7[7] -> 0[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 04/0 : 0[0] -> 8[0] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-master-0:81:3214 [7] NCCL INFO Channel 02/0 : 7[7] -> 0[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:81:3214 [7] NCCL INFO Channel 03/0 : 7[7] -> 0[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:81:3214 [7] NCCL INFO Channel 05/0 : 7[7] -> 0[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:81:3214 [7] NCCL INFO Channel 06/0 : 7[7] -> 0[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:81:3214 [7] NCCL INFO Channel 07/0 : 7[7] -> 0[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO Channel 03/0 : 38[6] -> 6[6] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO Channel 03/0 : 6[6] -> 38[6] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Channel 02/0 : 36[4] -> 4[4] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Channel 02/0 : 4[4] -> 36[4] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 00/0 : 32[0] -> 0[0] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 00/0 : 0[0] -> 32[0] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO Channel 01/0 : 34[2] -> 2[2] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO Channel 01/0 : 2[2] -> 34[2] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO Channel 05/0 : 10[2] -> 2[2] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Channel 06/0 : 12[4] -> 4[4] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO Channel 07/0 : 14[6] -> 6[6] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Channel 01/0 : 4[4] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Channel 03/0 : 4[4] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Channel 05/0 : 4[4] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Channel 07/0 : 4[4] -> 3[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Channel 04/0 : 8[0] -> 0[0] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-master-0:81:3214 [7] NCCL INFO Channel 03/0 : 7[7] -> 6[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:81:3214 [7] NCCL INFO Channel 07/0 : 7[7] -> 6[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-master-0:79:3217 [5] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-master-0:79:3217 [5] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-master-0:79:3217 [5] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-master-0:77:3213 [3] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-master-0:77:3213 [3] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-master-0:77:3213 [3] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-master-0:75:3215 [1] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-master-0:75:3215 [1] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-master-0:75:3215 [1] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-master-0:81:3214 [7] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-master-0:81:3214 [7] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-master-0:81:3214 [7] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-master-0:80:3211 [6] NCCL INFO comm 0x7fcfe0044110 rank 6 nranks 64 cudaDev 6 nvmlDev 6 busId 70 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-master-0:78:3216 [4] NCCL INFO comm 0x7f0fd0044270 rank 4 nranks 64 cudaDev 4 nvmlDev 4 busId 50 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-master-0:76:3212 [2] NCCL INFO comm 0x7fbc0c0444d0 rank 2 nranks 64 cudaDev 2 nvmlDev 2 busId 30 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-master-0:74:3210 [0] NCCL INFO comm 0x7f930c044ad0 rank 0 nranks 64 cudaDev 0 nvmlDev 0 busId 10 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-master-0:77:3213 [3] NCCL INFO comm 0x7f9194044480 rank 3 nranks 64 cudaDev 3 nvmlDev 3 busId 40 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-master-0:75:3215 [1] NCCL INFO comm 0x7f0ff80447d0 rank 1 nranks 64 cudaDev 1 nvmlDev 1 busId 20 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-master-0:81:3214 [7] NCCL INFO comm 0x7f9b400444d0 rank 7 nranks 64 cudaDev 7 nvmlDev 7 busId 80 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-master-0:79:3217 [5] NCCL INFO comm 0x7fb0580447f0 rank 5 nranks 64 cudaDev 5 nvmlDev 5 busId 60 commId 0x7f74f3499b0b795a - Init COMPLETE + + 0%| | 1/7378 [00:31<65:20:05, 31.88s/it] + +{'loss': 3.507, 'learning_rate': 9.00900900900901e-08, 'epoch': 0.0} + + 0%| | 1/7378 [00:31<65:20:05, 31.88s/it] + 0%| | 2/7378 [00:44<41:32:34, 20.28s/it] + +{'loss': 3.4428, 'learning_rate': 1.801801801801802e-07, 'epoch': 0.0} + + 0%| | 2/7378 [00:44<41:32:34, 20.28s/it] + 0%| | 3/7378 [00:56<33:53:17, 16.54s/it] + +{'loss': 3.3907, 'learning_rate': 2.702702702702703e-07, 'epoch': 0.0} + + 0%| | 3/7378 [00:56<33:53:17, 16.54s/it] + 0%| | 4/7378 [01:08<30:08:07, 14.71s/it] + +{'loss': 3.488, 'learning_rate': 3.603603603603604e-07, 'epoch': 0.0} + + 0%| | 4/7378 [01:08<30:08:07, 14.71s/it] + 0%| | 5/7378 [01:20<28:10:25, 13.76s/it] + +{'loss': 3.4441, 'learning_rate': 4.504504504504505e-07, 'epoch': 0.0} + + 0%| | 5/7378 [01:20<28:10:25, 13.76s/it] + 0%| | 6/7378 [01:32<26:52:51, 13.13s/it] + +{'loss': 3.3525, 'learning_rate': 5.405405405405406e-07, 'epoch': 0.0} + + 0%| | 6/7378 [01:32<26:52:51, 13.13s/it] + 0%| | 7/7378 [01:44<26:25:21, 12.90s/it] + +{'loss': 3.3938, 'learning_rate': 6.306306306306306e-07, 'epoch': 0.0} + + 0%| | 7/7378 [01:44<26:25:21, 12.90s/it] + 0%| | 8/7378 [01:56<26:01:16, 12.71s/it] + +{'loss': 3.2999, 'learning_rate': 7.207207207207208e-07, 'epoch': 0.0} + + 0%| | 8/7378 [01:56<26:01:16, 12.71s/it] + 0%| | 9/7378 [02:09<25:44:19, 12.57s/it] + +{'loss': 3.3, 'learning_rate': 8.108108108108109e-07, 'epoch': 0.0} + + 0%| | 9/7378 [02:09<25:44:19, 12.57s/it] + 0%| | 10/7378 [02:21<25:26:01, 12.43s/it] + +{'loss': 3.1621, 'learning_rate': 9.00900900900901e-07, 'epoch': 0.0} + + 0%| | 10/7378 [02:21<25:26:01, 12.43s/it] + 0%| | 11/7378 [02:33<25:10:29, 12.30s/it] + +{'loss': 2.821, 'learning_rate': 9.909909909909911e-07, 'epoch': 0.0} + + 0%| | 11/7378 [02:33<25:10:29, 12.30s/it] + 0%| | 12/7378 [02:45<24:58:28, 12.21s/it] + +{'loss': 2.7306, 'learning_rate': 1.0810810810810812e-06, 'epoch': 0.0} + + 0%| | 12/7378 [02:45<24:58:28, 12.21s/it] + 0%| | 13/7378 [02:57<25:03:43, 12.25s/it] + +{'loss': 2.7128, 'learning_rate': 1.1711711711711712e-06, 'epoch': 0.0} + + 0%| | 13/7378 [02:57<25:03:43, 12.25s/it] + 0%| | 14/7378 [03:09<25:07:42, 12.28s/it] + +{'loss': 2.6001, 'learning_rate': 1.2612612612612613e-06, 'epoch': 0.0} + + 0%| | 14/7378 [03:09<25:07:42, 12.28s/it] + 0%| | 15/7378 [03:22<25:07:22, 12.28s/it] + +{'loss': 2.2049, 'learning_rate': 1.3513513513513515e-06, 'epoch': 0.0} + + 0%| | 15/7378 [03:22<25:07:22, 12.28s/it] + 0%| | 16/7378 [03:34<24:58:43, 12.21s/it] + +{'loss': 2.1044, 'learning_rate': 1.4414414414414416e-06, 'epoch': 0.0} + + 0%| | 16/7378 [03:34<24:58:43, 12.21s/it] + 0%| | 17/7378 [03:46<24:54:31, 12.18s/it] + +{'loss': 1.9917, 'learning_rate': 1.5315315315315316e-06, 'epoch': 0.0} + + 0%| | 17/7378 [03:46<24:54:31, 12.18s/it] + 0%| | 18/7378 [03:58<25:04:39, 12.27s/it] + +{'loss': 1.9391, 'learning_rate': 1.6216216216216219e-06, 'epoch': 0.0} + + 0%| | 18/7378 [03:58<25:04:39, 12.27s/it] + 0%| | 19/7378 [04:11<25:27:44, 12.46s/it] + +{'loss': 1.9176, 'learning_rate': 1.711711711711712e-06, 'epoch': 0.0} + + 0%| | 19/7378 [04:11<25:27:44, 12.46s/it] + 0%| | 20/7378 [04:23<25:21:33, 12.41s/it] + +{'loss': 1.7942, 'learning_rate': 1.801801801801802e-06, 'epoch': 0.0} + + 0%| | 20/7378 [04:23<25:21:33, 12.41s/it] + 0%| | 21/7378 [04:35<25:01:49, 12.25s/it] + +{'loss': 1.5659, 'learning_rate': 1.8918918918918922e-06, 'epoch': 0.0} + + 0%| | 21/7378 [04:35<25:01:49, 12.25s/it] + 0%| | 22/7378 [04:48<25:04:16, 12.27s/it] + +{'loss': 1.5879, 'learning_rate': 1.9819819819819822e-06, 'epoch': 0.0} + + 0%| | 22/7378 [04:48<25:04:16, 12.27s/it] + 0%| | 23/7378 [05:00<25:09:33, 12.31s/it] + +{'loss': 1.4683, 'learning_rate': 2.0720720720720723e-06, 'epoch': 0.0} + + 0%| | 23/7378 [05:00<25:09:33, 12.31s/it] + 0%| | 24/7378 [05:12<25:08:31, 12.31s/it] + +{'loss': 1.5708, 'learning_rate': 2.1621621621621623e-06, 'epoch': 0.0} + + 0%| | 24/7378 [05:12<25:08:31, 12.31s/it] + 0%| | 25/7378 [05:25<25:06:09, 12.29s/it] + +{'loss': 1.4068, 'learning_rate': 2.2522522522522524e-06, 'epoch': 0.0} + + 0%| | 25/7378 [05:25<25:06:09, 12.29s/it] + 0%| | 26/7378 [05:37<25:01:35, 12.25s/it] + +{'loss': 1.3716, 'learning_rate': 2.3423423423423424e-06, 'epoch': 0.0} + + 0%| | 26/7378 [05:37<25:01:35, 12.25s/it] + 0%| | 27/7378 [05:49<25:08:39, 12.31s/it] + +{'loss': 1.3104, 'learning_rate': 2.432432432432433e-06, 'epoch': 0.0} + + 0%| | 27/7378 [05:49<25:08:39, 12.31s/it] + 0%| | 28/7378 [06:01<24:59:58, 12.24s/it] + +{'loss': 1.2858, 'learning_rate': 2.5225225225225225e-06, 'epoch': 0.0} + + 0%| | 28/7378 [06:01<24:59:58, 12.24s/it] + 0%| | 29/7378 [06:13<24:52:40, 12.19s/it] + +{'loss': 1.1873, 'learning_rate': 2.612612612612613e-06, 'epoch': 0.0} + + 0%| | 29/7378 [06:13<24:52:40, 12.19s/it] + 0%| | 30/7378 [06:26<25:01:13, 12.26s/it] + +{'loss': 1.2704, 'learning_rate': 2.702702702702703e-06, 'epoch': 0.0} + + 0%| | 30/7378 [06:26<25:01:13, 12.26s/it] + 0%| | 31/7378 [06:38<25:08:52, 12.32s/it] + +{'loss': 1.2162, 'learning_rate': 2.7927927927927926e-06, 'epoch': 0.0} + + 0%| | 31/7378 [06:38<25:08:52, 12.32s/it] + 0%| | 32/7378 [06:51<25:06:31, 12.30s/it] + +{'loss': 1.2056, 'learning_rate': 2.882882882882883e-06, 'epoch': 0.0} + + 0%| | 32/7378 [06:51<25:06:31, 12.30s/it] + 0%| | 33/7378 [07:02<24:54:15, 12.21s/it] + +{'loss': 1.1629, 'learning_rate': 2.9729729729729736e-06, 'epoch': 0.0} + + 0%| | 33/7378 [07:03<24:54:15, 12.21s/it] + 0%| | 34/7378 [07:15<25:15:30, 12.38s/it] + +{'loss': 1.0852, 'learning_rate': 3.063063063063063e-06, 'epoch': 0.0} + + 0%| | 34/7378 [07:15<25:15:30, 12.38s/it] + 0%| | 35/7378 [07:28<25:24:44, 12.46s/it] + +{'loss': 1.0375, 'learning_rate': 3.1531531531531532e-06, 'epoch': 0.0} + + 0%| | 35/7378 [07:28<25:24:44, 12.46s/it] + 0%| | 36/7378 [07:41<25:40:28, 12.59s/it] + +{'loss': 1.1485, 'learning_rate': 3.2432432432432437e-06, 'epoch': 0.0} + + 0%| | 36/7378 [07:41<25:40:28, 12.59s/it] + 1%| | 37/7378 [07:53<25:22:43, 12.45s/it] + +{'loss': 1.1327, 'learning_rate': 3.3333333333333333e-06, 'epoch': 0.01} + + 1%| | 37/7378 [07:53<25:22:43, 12.45s/it] + 1%| | 38/7378 [08:05<25:10:36, 12.35s/it] + +{'loss': 0.9639, 'learning_rate': 3.423423423423424e-06, 'epoch': 0.01} + + 1%| | 38/7378 [08:05<25:10:36, 12.35s/it] + 1%| | 39/7378 [08:18<25:19:52, 12.43s/it] + +{'loss': 0.9763, 'learning_rate': 3.513513513513514e-06, 'epoch': 0.01} + + 1%| | 39/7378 [08:18<25:19:52, 12.43s/it] + 1%| | 40/7378 [08:30<25:11:09, 12.36s/it] + +{'loss': 0.9258, 'learning_rate': 3.603603603603604e-06, 'epoch': 0.01} + + 1%| | 40/7378 [08:30<25:11:09, 12.36s/it] + 1%| | 41/7378 [08:42<24:59:50, 12.27s/it] + +{'loss': 1.004, 'learning_rate': 3.693693693693694e-06, 'epoch': 0.01} + + 1%| | 41/7378 [08:42<24:59:50, 12.27s/it] + 1%| | 42/7378 [08:54<24:59:15, 12.26s/it] + +{'loss': 0.9317, 'learning_rate': 3.7837837837837844e-06, 'epoch': 0.01} + + 1%| | 42/7378 [08:54<24:59:15, 12.26s/it] + 1%| | 43/7378 [09:06<24:47:57, 12.17s/it] + +{'loss': 0.9979, 'learning_rate': 3.8738738738738744e-06, 'epoch': 0.01} + + 1%| | 43/7378 [09:06<24:47:57, 12.17s/it] + 1%| | 44/7378 [09:18<24:42:43, 12.13s/it] + +{'loss': 0.8237, 'learning_rate': 3.9639639639639645e-06, 'epoch': 0.01} + + 1%| | 44/7378 [09:18<24:42:43, 12.13s/it] + 1%| | 45/7378 [09:30<24:32:21, 12.05s/it] + +{'loss': 0.9358, 'learning_rate': 4.0540540540540545e-06, 'epoch': 0.01} + + 1%| | 45/7378 [09:30<24:32:21, 12.05s/it] + 1%| | 46/7378 [09:42<24:39:27, 12.11s/it] + +{'loss': 0.8797, 'learning_rate': 4.1441441441441446e-06, 'epoch': 0.01} + + 1%| | 46/7378 [09:42<24:39:27, 12.11s/it] + 1%| | 47/7378 [09:54<24:31:03, 12.04s/it] + +{'loss': 0.8941, 'learning_rate': 4.234234234234235e-06, 'epoch': 0.01} + + 1%| | 47/7378 [09:54<24:31:03, 12.04s/it] + 1%| | 48/7378 [10:06<24:37:22, 12.09s/it] + +{'loss': 0.8541, 'learning_rate': 4.324324324324325e-06, 'epoch': 0.01} + + 1%| | 48/7378 [10:06<24:37:22, 12.09s/it] + 1%| | 49/7378 [10:18<24:38:32, 12.10s/it] + +{'loss': 0.7519, 'learning_rate': 4.414414414414415e-06, 'epoch': 0.01} + + 1%| | 49/7378 [10:18<24:38:32, 12.10s/it] + 1%| | 50/7378 [10:31<24:41:14, 12.13s/it] + +{'loss': 0.8109, 'learning_rate': 4.504504504504505e-06, 'epoch': 0.01} + + 1%| | 50/7378 [10:31<24:41:14, 12.13s/it] + 1%| | 51/7378 [10:43<24:35:50, 12.09s/it] + +{'loss': 0.9799, 'learning_rate': 4.594594594594596e-06, 'epoch': 0.01} + + 1%| | 51/7378 [10:43<24:35:50, 12.09s/it] + 1%| | 52/7378 [10:55<24:59:29, 12.28s/it] + +{'loss': 0.7671, 'learning_rate': 4.684684684684685e-06, 'epoch': 0.01} + + 1%| | 52/7378 [10:55<24:59:29, 12.28s/it] + 1%| | 53/7378 [11:08<24:59:28, 12.28s/it] + +{'loss': 0.8176, 'learning_rate': 4.774774774774775e-06, 'epoch': 0.01} + + 1%| | 53/7378 [11:08<24:59:28, 12.28s/it] + 1%| | 54/7378 [11:20<24:47:12, 12.18s/it] + +{'loss': 0.8327, 'learning_rate': 4.864864864864866e-06, 'epoch': 0.01} + + 1%| | 54/7378 [11:20<24:47:12, 12.18s/it] + 1%| | 55/7378 [11:32<24:51:26, 12.22s/it] + +{'loss': 0.7395, 'learning_rate': 4.954954954954955e-06, 'epoch': 0.01} + + 1%| | 55/7378 [11:32<24:51:26, 12.22s/it] + 1%| | 56/7378 [11:44<24:42:14, 12.15s/it] + +{'loss': 0.7989, 'learning_rate': 5.045045045045045e-06, 'epoch': 0.01} + + 1%| | 56/7378 [11:44<24:42:14, 12.15s/it] + 1%| | 57/7378 [11:56<24:47:35, 12.19s/it] + +{'loss': 0.8054, 'learning_rate': 5.135135135135135e-06, 'epoch': 0.01} + + 1%| | 57/7378 [11:56<24:47:35, 12.19s/it] + 1%| | 58/7378 [12:09<24:57:01, 12.27s/it] + +{'loss': 0.8339, 'learning_rate': 5.225225225225226e-06, 'epoch': 0.01} + + 1%| | 58/7378 [12:09<24:57:01, 12.27s/it] + 1%| | 59/7378 [12:21<24:49:19, 12.21s/it] + +{'loss': 0.8327, 'learning_rate': 5.315315315315316e-06, 'epoch': 0.01} + + 1%| | 59/7378 [12:21<24:49:19, 12.21s/it] + 1%| | 60/7378 [12:33<24:51:44, 12.23s/it] + +{'loss': 0.7512, 'learning_rate': 5.405405405405406e-06, 'epoch': 0.01} + + 1%| | 60/7378 [12:33<24:51:44, 12.23s/it] + 1%| | 61/7378 [12:45<24:41:22, 12.15s/it] + +{'loss': 0.7848, 'learning_rate': 5.495495495495496e-06, 'epoch': 0.01} + + 1%| | 61/7378 [12:45<24:41:22, 12.15s/it] + 1%| | 62/7378 [12:57<24:39:48, 12.14s/it] + +{'loss': 0.7846, 'learning_rate': 5.585585585585585e-06, 'epoch': 0.01} + + 1%| | 62/7378 [12:57<24:39:48, 12.14s/it] + 1%| | 63/7378 [13:10<24:55:01, 12.26s/it] + +{'loss': 0.784, 'learning_rate': 5.675675675675676e-06, 'epoch': 0.01} + + 1%| | 63/7378 [13:10<24:55:01, 12.26s/it] + 1%| | 64/7378 [13:22<24:46:25, 12.19s/it] + +{'loss': 0.7831, 'learning_rate': 5.765765765765766e-06, 'epoch': 0.01} + + 1%| | 64/7378 [13:22<24:46:25, 12.19s/it] + 1%| | 65/7378 [13:34<24:55:22, 12.27s/it] + +{'loss': 0.7753, 'learning_rate': 5.855855855855856e-06, 'epoch': 0.01} + + 1%| | 65/7378 [13:34<24:55:22, 12.27s/it] + 1%| | 66/7378 [13:46<24:47:34, 12.21s/it] + +{'loss': 0.8016, 'learning_rate': 5.945945945945947e-06, 'epoch': 0.01} + + 1%| | 66/7378 [13:46<24:47:34, 12.21s/it] + 1%| | 67/7378 [13:58<24:41:45, 12.16s/it] + +{'loss': 0.7226, 'learning_rate': 6.036036036036037e-06, 'epoch': 0.01} + + 1%| | 67/7378 [13:58<24:41:45, 12.16s/it] + 1%| | 68/7378 [14:10<24:43:11, 12.17s/it] + +{'loss': 0.7807, 'learning_rate': 6.126126126126126e-06, 'epoch': 0.01} + + 1%| | 68/7378 [14:10<24:43:11, 12.17s/it] + 1%| | 69/7378 [14:23<24:41:30, 12.16s/it] + +{'loss': 0.7924, 'learning_rate': 6.2162162162162164e-06, 'epoch': 0.01} + + 1%| | 69/7378 [14:23<24:41:30, 12.16s/it] + 1%| | 70/7378 [14:35<24:46:18, 12.20s/it] + +{'loss': 0.8044, 'learning_rate': 6.3063063063063065e-06, 'epoch': 0.01} + + 1%| | 70/7378 [14:35<24:46:18, 12.20s/it] + 1%| | 71/7378 [14:47<24:31:23, 12.08s/it] + +{'loss': 0.6349, 'learning_rate': 6.396396396396397e-06, 'epoch': 0.01} + + 1%| | 71/7378 [14:47<24:31:23, 12.08s/it] + 1%| | 72/7378 [14:59<24:33:50, 12.10s/it] + +{'loss': 0.6503, 'learning_rate': 6.486486486486487e-06, 'epoch': 0.01} + + 1%| | 72/7378 [14:59<24:33:50, 12.10s/it] + 1%| | 73/7378 [15:11<24:48:43, 12.23s/it] + +{'loss': 0.7217, 'learning_rate': 6.5765765765765775e-06, 'epoch': 0.01} + + 1%| | 73/7378 [15:11<24:48:43, 12.23s/it] + 1%| | 74/7378 [15:23<24:43:26, 12.19s/it] + +{'loss': 0.6997, 'learning_rate': 6.666666666666667e-06, 'epoch': 0.01} + + 1%| | 74/7378 [15:23<24:43:26, 12.19s/it] + 1%| | 75/7378 [15:36<25:01:32, 12.34s/it] + +{'loss': 0.7482, 'learning_rate': 6.7567567567567575e-06, 'epoch': 0.01} + + 1%| | 75/7378 [15:36<25:01:32, 12.34s/it] + 1%| | 76/7378 [15:48<24:54:30, 12.28s/it] + +{'loss': 0.6859, 'learning_rate': 6.846846846846848e-06, 'epoch': 0.01} + + 1%| | 76/7378 [15:48<24:54:30, 12.28s/it] + 1%| | 77/7378 [16:01<25:06:51, 12.38s/it] + +{'loss': 0.6687, 'learning_rate': 6.936936936936938e-06, 'epoch': 0.01} + + 1%| | 77/7378 [16:01<25:06:51, 12.38s/it] + 1%| | 78/7378 [16:13<25:04:56, 12.37s/it] + +{'loss': 0.7083, 'learning_rate': 7.027027027027028e-06, 'epoch': 0.01} + + 1%| | 78/7378 [16:13<25:04:56, 12.37s/it] + 1%| | 79/7378 [16:26<25:15:58, 12.46s/it] + +{'loss': 0.7573, 'learning_rate': 7.117117117117117e-06, 'epoch': 0.01} + + 1%| | 79/7378 [16:26<25:15:58, 12.46s/it] + 1%| | 80/7378 [16:38<24:55:01, 12.29s/it] + +{'loss': 0.7099, 'learning_rate': 7.207207207207208e-06, 'epoch': 0.01} + + 1%| | 80/7378 [16:38<24:55:01, 12.29s/it] + 1%| | 81/7378 [16:50<24:56:33, 12.31s/it] + +{'loss': 0.757, 'learning_rate': 7.297297297297298e-06, 'epoch': 0.01} + + 1%| | 81/7378 [16:50<24:56:33, 12.31s/it] + 1%| | 82/7378 [17:03<25:01:49, 12.35s/it] + +{'loss': 0.7315, 'learning_rate': 7.387387387387388e-06, 'epoch': 0.01} + + 1%| | 82/7378 [17:03<25:01:49, 12.35s/it] + 1%| | 83/7378 [17:15<25:06:49, 12.39s/it] + +{'loss': 0.7181, 'learning_rate': 7.477477477477479e-06, 'epoch': 0.01} + + 1%| | 83/7378 [17:15<25:06:49, 12.39s/it] + 1%| | 84/7378 [17:27<25:00:35, 12.34s/it] + +{'loss': 0.7654, 'learning_rate': 7.567567567567569e-06, 'epoch': 0.01} + + 1%| | 84/7378 [17:27<25:00:35, 12.34s/it] + 1%| | 85/7378 [17:39<24:52:14, 12.28s/it] + +{'loss': 0.6348, 'learning_rate': 7.657657657657658e-06, 'epoch': 0.01} + + 1%| | 85/7378 [17:39<24:52:14, 12.28s/it] + 1%| | 86/7378 [17:52<24:57:54, 12.33s/it] + +{'loss': 0.7001, 'learning_rate': 7.747747747747749e-06, 'epoch': 0.01} + + 1%| | 86/7378 [17:52<24:57:54, 12.33s/it] + 1%| | 87/7378 [18:04<24:56:55, 12.32s/it] + +{'loss': 0.6303, 'learning_rate': 7.837837837837838e-06, 'epoch': 0.01} + + 1%| | 87/7378 [18:04<24:56:55, 12.32s/it] + 1%| | 88/7378 [18:16<24:47:40, 12.24s/it] + +{'loss': 0.7274, 'learning_rate': 7.927927927927929e-06, 'epoch': 0.01} + + 1%| | 88/7378 [18:16<24:47:40, 12.24s/it] + 1%| | 89/7378 [18:28<24:32:50, 12.12s/it] + +{'loss': 0.639, 'learning_rate': 8.018018018018018e-06, 'epoch': 0.01} + + 1%| | 89/7378 [18:28<24:32:50, 12.12s/it] + 1%| | 90/7378 [18:41<24:49:51, 12.27s/it] + +{'loss': 0.7079, 'learning_rate': 8.108108108108109e-06, 'epoch': 0.01} + + 1%| | 90/7378 [18:41<24:49:51, 12.27s/it] + 1%| | 91/7378 [18:53<24:56:57, 12.33s/it] + +{'loss': 0.7556, 'learning_rate': 8.198198198198198e-06, 'epoch': 0.01} + + 1%| | 91/7378 [18:53<24:56:57, 12.33s/it] + 1%| | 92/7378 [19:06<24:58:16, 12.34s/it] + +{'loss': 0.7608, 'learning_rate': 8.288288288288289e-06, 'epoch': 0.01} + + 1%| | 92/7378 [19:06<24:58:16, 12.34s/it] + 1%|▏ | 93/7378 [19:18<24:52:50, 12.30s/it] + +{'loss': 0.6424, 'learning_rate': 8.378378378378378e-06, 'epoch': 0.01} + + 1%|▏ | 93/7378 [19:18<24:52:50, 12.30s/it] + 1%|▏ | 94/7378 [19:30<24:49:21, 12.27s/it] + +{'loss': 0.7258, 'learning_rate': 8.46846846846847e-06, 'epoch': 0.01} + + 1%|▏ | 94/7378 [19:30<24:49:21, 12.27s/it] + 1%|▏ | 95/7378 [19:42<24:47:01, 12.25s/it] + +{'loss': 0.5761, 'learning_rate': 8.55855855855856e-06, 'epoch': 0.01} + + 1%|▏ | 95/7378 [19:42<24:47:01, 12.25s/it] + 1%|▏ | 96/7378 [19:55<24:52:53, 12.30s/it] + +{'loss': 0.7576, 'learning_rate': 8.64864864864865e-06, 'epoch': 0.01} + + 1%|▏ | 96/7378 [19:55<24:52:53, 12.30s/it] + 1%|▏ | 97/7378 [20:07<24:45:43, 12.24s/it] + +{'loss': 0.7949, 'learning_rate': 8.738738738738739e-06, 'epoch': 0.01} + + 1%|▏ | 97/7378 [20:07<24:45:43, 12.24s/it] + 1%|▏ | 98/7378 [20:19<24:52:44, 12.30s/it] + +{'loss': 0.6406, 'learning_rate': 8.82882882882883e-06, 'epoch': 0.01} + + 1%|▏ | 98/7378 [20:19<24:52:44, 12.30s/it] + 1%|▏ | 99/7378 [20:31<24:42:49, 12.22s/it] + +{'loss': 0.6316, 'learning_rate': 8.91891891891892e-06, 'epoch': 0.01} + + 1%|▏ | 99/7378 [20:31<24:42:49, 12.22s/it] + 1%|▏ | 100/7378 [20:44<24:49:12, 12.28s/it] + +{'loss': 0.6675, 'learning_rate': 9.00900900900901e-06, 'epoch': 0.01} + + 1%|▏ | 100/7378 [20:44<24:49:12, 12.28s/it] + 1%|▏ | 101/7378 [20:56<24:41:18, 12.21s/it] + +{'loss': 0.6055, 'learning_rate': 9.0990990990991e-06, 'epoch': 0.01} + + 1%|▏ | 101/7378 [20:56<24:41:18, 12.21s/it] + 1%|▏ | 102/7378 [21:08<24:52:45, 12.31s/it] + +{'loss': 0.6646, 'learning_rate': 9.189189189189191e-06, 'epoch': 0.01} + + 1%|▏ | 102/7378 [21:08<24:52:45, 12.31s/it] + 1%|▏ | 103/7378 [21:20<24:42:24, 12.23s/it] + +{'loss': 0.7368, 'learning_rate': 9.27927927927928e-06, 'epoch': 0.01} + + 1%|▏ | 103/7378 [21:20<24:42:24, 12.23s/it] + 1%|▏ | 104/7378 [21:32<24:45:27, 12.25s/it] + +{'loss': 0.6621, 'learning_rate': 9.36936936936937e-06, 'epoch': 0.01} + + 1%|▏ | 104/7378 [21:32<24:45:27, 12.25s/it] + 1%|▏ | 105/7378 [21:45<24:46:20, 12.26s/it] + +{'loss': 0.641, 'learning_rate': 9.45945945945946e-06, 'epoch': 0.01} + + 1%|▏ | 105/7378 [21:45<24:46:20, 12.26s/it] + 1%|▏ | 106/7378 [21:57<24:40:38, 12.22s/it] + +{'loss': 0.66, 'learning_rate': 9.54954954954955e-06, 'epoch': 0.01} + + 1%|▏ | 106/7378 [21:57<24:40:38, 12.22s/it] + 1%|▏ | 107/7378 [22:09<24:48:08, 12.28s/it] + +{'loss': 0.6109, 'learning_rate': 9.63963963963964e-06, 'epoch': 0.01} + + 1%|▏ | 107/7378 [22:09<24:48:08, 12.28s/it] + 1%|▏ | 108/7378 [22:22<24:48:58, 12.29s/it] + +{'loss': 0.6326, 'learning_rate': 9.729729729729732e-06, 'epoch': 0.01} + + 1%|▏ | 108/7378 [22:22<24:48:58, 12.29s/it] + 1%|▏ | 109/7378 [22:34<25:01:27, 12.39s/it] + +{'loss': 0.6023, 'learning_rate': 9.81981981981982e-06, 'epoch': 0.01} + + 1%|▏ | 109/7378 [22:34<25:01:27, 12.39s/it] + 1%|▏ | 110/7378 [22:47<25:03:34, 12.41s/it] + +{'loss': 0.6263, 'learning_rate': 9.90990990990991e-06, 'epoch': 0.01} + + 1%|▏ | 110/7378 [22:47<25:03:34, 12.41s/it] + 2%|▏ | 111/7378 [22:59<24:54:54, 12.34s/it] + +{'loss': 0.7207, 'learning_rate': 1e-05, 'epoch': 0.02} + + 2%|▏ | 111/7378 [22:59<24:54:54, 12.34s/it] + 2%|▏ | 112/7378 [23:11<24:56:41, 12.36s/it] + +{'loss': 0.6551, 'learning_rate': 1.009009009009009e-05, 'epoch': 0.02} + + 2%|▏ | 112/7378 [23:11<24:56:41, 12.36s/it] + 2%|▏ | 113/7378 [23:23<24:47:31, 12.29s/it] + +{'loss': 0.5665, 'learning_rate': 1.0180180180180181e-05, 'epoch': 0.02} + + 2%|▏ | 113/7378 [23:23<24:47:31, 12.29s/it] + 2%|▏ | 114/7378 [23:35<24:37:14, 12.20s/it] + +{'loss': 0.6855, 'learning_rate': 1.027027027027027e-05, 'epoch': 0.02} + + 2%|▏ | 114/7378 [23:35<24:37:14, 12.20s/it] + 2%|▏ | 115/7378 [23:48<24:47:46, 12.29s/it] + +{'loss': 0.5805, 'learning_rate': 1.0360360360360363e-05, 'epoch': 0.02} + + 2%|▏ | 115/7378 [23:48<24:47:46, 12.29s/it] + 2%|▏ | 116/7378 [24:00<24:39:57, 12.23s/it] + +{'loss': 0.6493, 'learning_rate': 1.0450450450450452e-05, 'epoch': 0.02} + + 2%|▏ | 116/7378 [24:00<24:39:57, 12.23s/it] + 2%|▏ | 117/7378 [24:12<24:38:17, 12.22s/it] + +{'loss': 0.6844, 'learning_rate': 1.0540540540540541e-05, 'epoch': 0.02} + + 2%|▏ | 117/7378 [24:12<24:38:17, 12.22s/it] + 2%|▏ | 118/7378 [24:24<24:39:44, 12.23s/it] + +{'loss': 0.6536, 'learning_rate': 1.0630630630630632e-05, 'epoch': 0.02} + + 2%|▏ | 118/7378 [24:24<24:39:44, 12.23s/it] + 2%|▏ | 119/7378 [24:37<24:54:25, 12.35s/it] + +{'loss': 0.6154, 'learning_rate': 1.0720720720720721e-05, 'epoch': 0.02} + + 2%|▏ | 119/7378 [24:37<24:54:25, 12.35s/it] + 2%|▏ | 120/7378 [24:49<24:45:29, 12.28s/it] + +{'loss': 0.6112, 'learning_rate': 1.0810810810810812e-05, 'epoch': 0.02} + + 2%|▏ | 120/7378 [24:49<24:45:29, 12.28s/it] + 2%|▏ | 121/7378 [25:01<24:41:22, 12.25s/it] + +{'loss': 0.5803, 'learning_rate': 1.0900900900900901e-05, 'epoch': 0.02} + + 2%|▏ | 121/7378 [25:01<24:41:22, 12.25s/it] + 2%|▏ | 122/7378 [25:14<25:00:27, 12.41s/it] + +{'loss': 0.615, 'learning_rate': 1.0990990990990992e-05, 'epoch': 0.02} + + 2%|▏ | 122/7378 [25:14<25:00:27, 12.41s/it] + 2%|▏ | 123/7378 [25:27<25:07:45, 12.47s/it] + +{'loss': 0.6659, 'learning_rate': 1.1081081081081081e-05, 'epoch': 0.02} + + 2%|▏ | 123/7378 [25:27<25:07:45, 12.47s/it] + 2%|▏ | 124/7378 [25:39<24:56:33, 12.38s/it] + +{'loss': 0.66, 'learning_rate': 1.117117117117117e-05, 'epoch': 0.02} + + 2%|▏ | 124/7378 [25:39<24:56:33, 12.38s/it] + 2%|▏ | 125/7378 [25:51<24:50:34, 12.33s/it] + +{'loss': 0.674, 'learning_rate': 1.1261261261261263e-05, 'epoch': 0.02} + + 2%|▏ | 125/7378 [25:51<24:50:34, 12.33s/it] + 2%|▏ | 126/7378 [26:04<25:05:03, 12.45s/it] + +{'loss': 0.6028, 'learning_rate': 1.1351351351351352e-05, 'epoch': 0.02} + + 2%|▏ | 126/7378 [26:04<25:05:03, 12.45s/it] + 2%|▏ | 127/7378 [26:16<24:58:46, 12.40s/it] + +{'loss': 0.6186, 'learning_rate': 1.1441441441441443e-05, 'epoch': 0.02} + + 2%|▏ | 127/7378 [26:16<24:58:46, 12.40s/it] + 2%|▏ | 128/7378 [26:29<24:59:16, 12.41s/it] + +{'loss': 0.7292, 'learning_rate': 1.1531531531531532e-05, 'epoch': 0.02} + + 2%|▏ | 128/7378 [26:29<24:59:16, 12.41s/it] + 2%|▏ | 129/7378 [26:41<24:44:52, 12.29s/it] + +{'loss': 0.6269, 'learning_rate': 1.1621621621621622e-05, 'epoch': 0.02} + + 2%|▏ | 129/7378 [26:41<24:44:52, 12.29s/it] + 2%|▏ | 130/7378 [26:53<24:52:42, 12.36s/it] + +{'loss': 0.5701, 'learning_rate': 1.1711711711711713e-05, 'epoch': 0.02} + + 2%|▏ | 130/7378 [26:53<24:52:42, 12.36s/it] + 2%|▏ | 131/7378 [27:06<24:58:44, 12.41s/it] + +{'loss': 0.6104, 'learning_rate': 1.1801801801801802e-05, 'epoch': 0.02} + + 2%|▏ | 131/7378 [27:06<24:58:44, 12.41s/it] + 2%|▏ | 132/7378 [27:18<24:46:59, 12.31s/it] + +{'loss': 0.5445, 'learning_rate': 1.1891891891891894e-05, 'epoch': 0.02} + + 2%|▏ | 132/7378 [27:18<24:46:59, 12.31s/it] + 2%|▏ | 133/7378 [27:30<24:42:24, 12.28s/it] + +{'loss': 0.6167, 'learning_rate': 1.1981981981981983e-05, 'epoch': 0.02} + + 2%|▏ | 133/7378 [27:30<24:42:24, 12.28s/it] + 2%|▏ | 134/7378 [27:42<24:32:38, 12.20s/it] + +{'loss': 0.545, 'learning_rate': 1.2072072072072074e-05, 'epoch': 0.02} + + 2%|▏ | 134/7378 [27:42<24:32:38, 12.20s/it] + 2%|▏ | 135/7378 [27:54<24:27:19, 12.16s/it] + +{'loss': 0.5665, 'learning_rate': 1.2162162162162164e-05, 'epoch': 0.02} + + 2%|▏ | 135/7378 [27:54<24:27:19, 12.16s/it] + 2%|▏ | 136/7378 [28:07<24:58:17, 12.41s/it] + +{'loss': 0.5933, 'learning_rate': 1.2252252252252253e-05, 'epoch': 0.02} + + 2%|▏ | 136/7378 [28:07<24:58:17, 12.41s/it] + 2%|▏ | 137/7378 [28:19<24:52:54, 12.37s/it] + +{'loss': 0.5975, 'learning_rate': 1.2342342342342344e-05, 'epoch': 0.02} + + 2%|▏ | 137/7378 [28:19<24:52:54, 12.37s/it] + 2%|▏ | 138/7378 [28:31<24:42:39, 12.29s/it] + +{'loss': 0.6071, 'learning_rate': 1.2432432432432433e-05, 'epoch': 0.02} + + 2%|▏ | 138/7378 [28:31<24:42:39, 12.29s/it] + 2%|▏ | 139/7378 [28:44<24:43:15, 12.29s/it] + +{'loss': 0.6111, 'learning_rate': 1.2522522522522524e-05, 'epoch': 0.02} + + 2%|▏ | 139/7378 [28:44<24:43:15, 12.29s/it] + 2%|▏ | 140/7378 [28:56<24:48:21, 12.34s/it] + +{'loss': 0.59, 'learning_rate': 1.2612612612612613e-05, 'epoch': 0.02} + + 2%|▏ | 140/7378 [28:56<24:48:21, 12.34s/it] + 2%|▏ | 141/7378 [29:08<24:43:05, 12.30s/it] + +{'loss': 0.6014, 'learning_rate': 1.2702702702702702e-05, 'epoch': 0.02} + + 2%|▏ | 141/7378 [29:08<24:43:05, 12.30s/it] + 2%|▏ | 142/7378 [29:21<24:45:32, 12.32s/it] + +{'loss': 0.6458, 'learning_rate': 1.2792792792792795e-05, 'epoch': 0.02} + + 2%|▏ | 142/7378 [29:21<24:45:32, 12.32s/it] + 2%|▏ | 143/7378 [29:33<24:44:01, 12.31s/it] + +{'loss': 0.6205, 'learning_rate': 1.2882882882882884e-05, 'epoch': 0.02} + + 2%|▏ | 143/7378 [29:33<24:44:01, 12.31s/it] + 2%|▏ | 144/7378 [29:45<24:38:06, 12.26s/it] + +{'loss': 0.6196, 'learning_rate': 1.2972972972972975e-05, 'epoch': 0.02} + + 2%|▏ | 144/7378 [29:45<24:38:06, 12.26s/it] + 2%|▏ | 145/7378 [29:58<24:52:33, 12.38s/it] + +{'loss': 0.592, 'learning_rate': 1.3063063063063064e-05, 'epoch': 0.02} + + 2%|▏ | 145/7378 [29:58<24:52:33, 12.38s/it] + 2%|▏ | 146/7378 [30:10<24:55:34, 12.41s/it] + +{'loss': 0.6026, 'learning_rate': 1.3153153153153155e-05, 'epoch': 0.02} + + 2%|▏ | 146/7378 [30:10<24:55:34, 12.41s/it] + 2%|▏ | 147/7378 [30:23<25:01:40, 12.46s/it] + +{'loss': 0.6468, 'learning_rate': 1.3243243243243244e-05, 'epoch': 0.02} + + 2%|▏ | 147/7378 [30:23<25:01:40, 12.46s/it] + 2%|▏ | 148/7378 [30:35<24:42:57, 12.31s/it] + +{'loss': 0.5438, 'learning_rate': 1.3333333333333333e-05, 'epoch': 0.02} + + 2%|▏ | 148/7378 [30:35<24:42:57, 12.31s/it] + 2%|▏ | 149/7378 [30:47<24:40:35, 12.29s/it] + +{'loss': 0.5702, 'learning_rate': 1.3423423423423426e-05, 'epoch': 0.02} + + 2%|▏ | 149/7378 [30:47<24:40:35, 12.29s/it] + 2%|▏ | 150/7378 [31:00<25:02:21, 12.47s/it] + +{'loss': 0.6098, 'learning_rate': 1.3513513513513515e-05, 'epoch': 0.02} + + 2%|▏ | 150/7378 [31:00<25:02:21, 12.47s/it] + 2%|▏ | 151/7378 [31:13<25:15:59, 12.59s/it] + +{'loss': 0.5749, 'learning_rate': 1.3603603603603606e-05, 'epoch': 0.02} + + 2%|▏ | 151/7378 [31:13<25:15:59, 12.59s/it] + 2%|▏ | 152/7378 [31:25<25:09:19, 12.53s/it] + +{'loss': 0.6567, 'learning_rate': 1.3693693693693695e-05, 'epoch': 0.02} + + 2%|▏ | 152/7378 [31:25<25:09:19, 12.53s/it] + 2%|▏ | 153/7378 [31:37<24:53:23, 12.40s/it] + +{'loss': 0.6165, 'learning_rate': 1.3783783783783784e-05, 'epoch': 0.02} + + 2%|▏ | 153/7378 [31:37<24:53:23, 12.40s/it] + 2%|▏ | 154/7378 [31:49<24:37:32, 12.27s/it] + +{'loss': 0.598, 'learning_rate': 1.3873873873873875e-05, 'epoch': 0.02} + + 2%|▏ | 154/7378 [31:49<24:37:32, 12.27s/it] + 2%|▏ | 155/7378 [32:02<24:49:21, 12.37s/it] + +{'loss': 0.5977, 'learning_rate': 1.3963963963963964e-05, 'epoch': 0.02} + + 2%|▏ | 155/7378 [32:02<24:49:21, 12.37s/it] + 2%|▏ | 156/7378 [32:14<24:42:53, 12.32s/it] + +{'loss': 0.594, 'learning_rate': 1.4054054054054055e-05, 'epoch': 0.02} + + 2%|▏ | 156/7378 [32:14<24:42:53, 12.32s/it] + 2%|▏ | 157/7378 [32:26<24:38:53, 12.29s/it] + +{'loss': 0.5512, 'learning_rate': 1.4144144144144145e-05, 'epoch': 0.02} + + 2%|▏ | 157/7378 [32:26<24:38:53, 12.29s/it] + 2%|▏ | 158/7378 [32:38<24:26:49, 12.19s/it] + +{'loss': 0.5875, 'learning_rate': 1.4234234234234234e-05, 'epoch': 0.02} + + 2%|▏ | 158/7378 [32:38<24:26:49, 12.19s/it] + 2%|▏ | 159/7378 [32:50<24:30:07, 12.22s/it] + +{'loss': 0.609, 'learning_rate': 1.4324324324324326e-05, 'epoch': 0.02} + + 2%|▏ | 159/7378 [32:51<24:30:07, 12.22s/it] + 2%|▏ | 160/7378 [33:03<24:22:54, 12.16s/it] + +{'loss': 0.6495, 'learning_rate': 1.4414414414414416e-05, 'epoch': 0.02} + + 2%|▏ | 160/7378 [33:03<24:22:54, 12.16s/it] + 2%|▏ | 161/7378 [33:15<24:31:14, 12.23s/it] + +{'loss': 0.7259, 'learning_rate': 1.4504504504504506e-05, 'epoch': 0.02} + + 2%|▏ | 161/7378 [33:15<24:31:14, 12.23s/it] + 2%|▏ | 162/7378 [33:27<24:36:19, 12.28s/it] + +{'loss': 0.5946, 'learning_rate': 1.4594594594594596e-05, 'epoch': 0.02} + + 2%|▏ | 162/7378 [33:27<24:36:19, 12.28s/it] + 2%|▏ | 163/7378 [33:40<24:45:46, 12.36s/it] + +{'loss': 0.5952, 'learning_rate': 1.4684684684684686e-05, 'epoch': 0.02} + + 2%|▏ | 163/7378 [33:40<24:45:46, 12.36s/it] + 2%|▏ | 164/7378 [33:52<24:43:30, 12.34s/it] + +{'loss': 0.6296, 'learning_rate': 1.4774774774774776e-05, 'epoch': 0.02} + + 2%|▏ | 164/7378 [33:52<24:43:30, 12.34s/it] + 2%|▏ | 165/7378 [34:04<24:43:45, 12.34s/it] + +{'loss': 0.6335, 'learning_rate': 1.4864864864864865e-05, 'epoch': 0.02} + + 2%|▏ | 165/7378 [34:04<24:43:45, 12.34s/it] + 2%|▏ | 166/7378 [34:17<24:42:41, 12.34s/it] + +{'loss': 0.6493, 'learning_rate': 1.4954954954954957e-05, 'epoch': 0.02} + + 2%|▏ | 166/7378 [34:17<24:42:41, 12.34s/it] + 2%|▏ | 167/7378 [34:29<24:26:15, 12.20s/it] + +{'loss': 0.6309, 'learning_rate': 1.5045045045045045e-05, 'epoch': 0.02} + + 2%|▏ | 167/7378 [34:29<24:26:15, 12.20s/it] + 2%|▏ | 168/7378 [34:41<24:21:05, 12.16s/it] + +{'loss': 0.6693, 'learning_rate': 1.5135135135135138e-05, 'epoch': 0.02} + + 2%|▏ | 168/7378 [34:41<24:21:05, 12.16s/it] + 2%|▏ | 169/7378 [34:53<24:20:59, 12.16s/it] + +{'loss': 0.6108, 'learning_rate': 1.5225225225225227e-05, 'epoch': 0.02} + + 2%|▏ | 169/7378 [34:53<24:20:59, 12.16s/it] + 2%|▏ | 170/7378 [35:06<24:39:58, 12.32s/it] + +{'loss': 0.5725, 'learning_rate': 1.5315315315315316e-05, 'epoch': 0.02} + + 2%|▏ | 170/7378 [35:06<24:39:58, 12.32s/it] + 2%|▏ | 171/7378 [35:18<24:46:16, 12.37s/it] + +{'loss': 0.6435, 'learning_rate': 1.540540540540541e-05, 'epoch': 0.02} + + 2%|▏ | 171/7378 [35:18<24:46:16, 12.37s/it] + 2%|▏ | 172/7378 [35:31<24:56:54, 12.46s/it] + +{'loss': 0.6381, 'learning_rate': 1.5495495495495498e-05, 'epoch': 0.02} + + 2%|▏ | 172/7378 [35:31<24:56:54, 12.46s/it] + 2%|▏ | 173/7378 [35:43<24:56:14, 12.46s/it] + +{'loss': 0.6125, 'learning_rate': 1.5585585585585587e-05, 'epoch': 0.02} + + 2%|▏ | 173/7378 [35:43<24:56:14, 12.46s/it] + 2%|▏ | 174/7378 [35:56<24:51:18, 12.42s/it] + +{'loss': 0.6449, 'learning_rate': 1.5675675675675676e-05, 'epoch': 0.02} + + 2%|▏ | 174/7378 [35:56<24:51:18, 12.42s/it] + 2%|▏ | 175/7378 [36:08<24:34:16, 12.28s/it] + +{'loss': 0.5813, 'learning_rate': 1.576576576576577e-05, 'epoch': 0.02} + + 2%|▏ | 175/7378 [36:08<24:34:16, 12.28s/it] + 2%|▏ | 176/7378 [36:20<24:28:53, 12.24s/it] + +{'loss': 0.5699, 'learning_rate': 1.5855855855855858e-05, 'epoch': 0.02} + + 2%|▏ | 176/7378 [36:20<24:28:53, 12.24s/it] + 2%|▏ | 177/7378 [36:32<24:37:51, 12.31s/it] + +{'loss': 0.5759, 'learning_rate': 1.5945945945945947e-05, 'epoch': 0.02} + + 2%|▏ | 177/7378 [36:32<24:37:51, 12.31s/it] + 2%|▏ | 178/7378 [36:45<24:55:13, 12.46s/it] + +{'loss': 0.6208, 'learning_rate': 1.6036036036036036e-05, 'epoch': 0.02} + + 2%|▏ | 178/7378 [36:45<24:55:13, 12.46s/it] + 2%|▏ | 179/7378 [36:57<24:43:14, 12.36s/it] + +{'loss': 0.5436, 'learning_rate': 1.6126126126126126e-05, 'epoch': 0.02} + + 2%|▏ | 179/7378 [36:57<24:43:14, 12.36s/it] + 2%|▏ | 180/7378 [37:09<24:31:12, 12.26s/it] + +{'loss': 0.5514, 'learning_rate': 1.6216216216216218e-05, 'epoch': 0.02} + + 2%|▏ | 180/7378 [37:09<24:31:12, 12.26s/it] + 2%|▏ | 181/7378 [37:22<24:40:21, 12.34s/it] + +{'loss': 0.6418, 'learning_rate': 1.6306306306306307e-05, 'epoch': 0.02} + + 2%|▏ | 181/7378 [37:22<24:40:21, 12.34s/it] + 2%|▏ | 182/7378 [37:34<24:47:26, 12.40s/it] + +{'loss': 0.6012, 'learning_rate': 1.6396396396396396e-05, 'epoch': 0.02} + + 2%|▏ | 182/7378 [37:34<24:47:26, 12.40s/it] + 2%|▏ | 183/7378 [37:46<24:41:00, 12.35s/it] + +{'loss': 0.553, 'learning_rate': 1.648648648648649e-05, 'epoch': 0.02} + + 2%|▏ | 183/7378 [37:46<24:41:00, 12.35s/it] + 2%|▏ | 184/7378 [37:59<24:33:49, 12.29s/it] + +{'loss': 0.5609, 'learning_rate': 1.6576576576576578e-05, 'epoch': 0.02} + + 2%|▏ | 184/7378 [37:59<24:33:49, 12.29s/it] + 3%|▎ | 185/7378 [38:11<24:29:26, 12.26s/it] + +{'loss': 0.5165, 'learning_rate': 1.6666666666666667e-05, 'epoch': 0.03} + + 3%|▎ | 185/7378 [38:11<24:29:26, 12.26s/it] + 3%|▎ | 186/7378 [38:24<24:54:23, 12.47s/it] + +{'loss': 0.5483, 'learning_rate': 1.6756756756756757e-05, 'epoch': 0.03} + + 3%|▎ | 186/7378 [38:24<24:54:23, 12.47s/it] + 3%|▎ | 187/7378 [38:36<24:45:33, 12.40s/it] + +{'loss': 0.5599, 'learning_rate': 1.6846846846846846e-05, 'epoch': 0.03} + + 3%|▎ | 187/7378 [38:36<24:45:33, 12.40s/it] + 3%|▎ | 188/7378 [38:48<24:44:09, 12.39s/it] + +{'loss': 0.5684, 'learning_rate': 1.693693693693694e-05, 'epoch': 0.03} + + 3%|▎ | 188/7378 [38:48<24:44:09, 12.39s/it] + 3%|▎ | 189/7378 [39:01<24:45:04, 12.39s/it] + +{'loss': 0.5967, 'learning_rate': 1.7027027027027028e-05, 'epoch': 0.03} + + 3%|▎ | 189/7378 [39:01<24:45:04, 12.39s/it] + 3%|▎ | 190/7378 [39:13<24:42:16, 12.37s/it] + +{'loss': 0.4847, 'learning_rate': 1.711711711711712e-05, 'epoch': 0.03} + + 3%|▎ | 190/7378 [39:13<24:42:16, 12.37s/it] + 3%|▎ | 191/7378 [39:25<24:33:10, 12.30s/it] + +{'loss': 0.5768, 'learning_rate': 1.720720720720721e-05, 'epoch': 0.03} + + 3%|▎ | 191/7378 [39:25<24:33:10, 12.30s/it] + 3%|▎ | 192/7378 [39:37<24:23:26, 12.22s/it] + +{'loss': 0.547, 'learning_rate': 1.72972972972973e-05, 'epoch': 0.03} + + 3%|▎ | 192/7378 [39:37<24:23:26, 12.22s/it] + 3%|▎ | 193/7378 [39:49<24:21:44, 12.21s/it] + +{'loss': 0.5809, 'learning_rate': 1.7387387387387388e-05, 'epoch': 0.03} + + 3%|▎ | 193/7378 [39:49<24:21:44, 12.21s/it] + 3%|▎ | 194/7378 [40:02<24:35:37, 12.32s/it] + +{'loss': 0.6113, 'learning_rate': 1.7477477477477477e-05, 'epoch': 0.03} + + 3%|▎ | 194/7378 [40:02<24:35:37, 12.32s/it] + 3%|▎ | 195/7378 [40:14<24:34:18, 12.31s/it] + +{'loss': 0.5158, 'learning_rate': 1.756756756756757e-05, 'epoch': 0.03} + + 3%|▎ | 195/7378 [40:14<24:34:18, 12.31s/it] + 3%|▎ | 196/7378 [40:26<24:25:45, 12.25s/it] + +{'loss': 0.6834, 'learning_rate': 1.765765765765766e-05, 'epoch': 0.03} + + 3%|▎ | 196/7378 [40:26<24:25:45, 12.25s/it] + 3%|▎ | 197/7378 [40:39<24:32:36, 12.30s/it] + +{'loss': 0.5766, 'learning_rate': 1.774774774774775e-05, 'epoch': 0.03} + + 3%|▎ | 197/7378 [40:39<24:32:36, 12.30s/it] + 3%|▎ | 198/7378 [40:51<24:33:48, 12.32s/it] + +{'loss': 0.5758, 'learning_rate': 1.783783783783784e-05, 'epoch': 0.03} + + 3%|▎ | 198/7378 [40:51<24:33:48, 12.32s/it] + 3%|▎ | 199/7378 [41:03<24:34:24, 12.32s/it] + +{'loss': 0.6095, 'learning_rate': 1.792792792792793e-05, 'epoch': 0.03} + + 3%|▎ | 199/7378 [41:03<24:34:24, 12.32s/it] + 3%|▎ | 200/7378 [41:16<24:36:42, 12.34s/it] + +{'loss': 0.6351, 'learning_rate': 1.801801801801802e-05, 'epoch': 0.03} + + 3%|▎ | 200/7378 [41:16<24:36:42, 12.34s/it] + 3%|▎ | 201/7378 [41:29<24:48:05, 12.44s/it] + +{'loss': 0.5774, 'learning_rate': 1.8108108108108108e-05, 'epoch': 0.03} + + 3%|▎ | 201/7378 [41:29<24:48:05, 12.44s/it] + 3%|▎ | 202/7378 [41:41<24:50:07, 12.46s/it] + +{'loss': 0.6493, 'learning_rate': 1.81981981981982e-05, 'epoch': 0.03} + + 3%|▎ | 202/7378 [41:41<24:50:07, 12.46s/it] + 3%|▎ | 203/7378 [41:54<24:57:33, 12.52s/it] + +{'loss': 0.5991, 'learning_rate': 1.828828828828829e-05, 'epoch': 0.03} + + 3%|▎ | 203/7378 [41:54<24:57:33, 12.52s/it] + 3%|▎ | 204/7378 [42:06<24:33:06, 12.32s/it] + +{'loss': 0.5589, 'learning_rate': 1.8378378378378383e-05, 'epoch': 0.03} + + 3%|▎ | 204/7378 [42:06<24:33:06, 12.32s/it] + 3%|▎ | 205/7378 [42:18<24:30:18, 12.30s/it] + +{'loss': 0.5707, 'learning_rate': 1.8468468468468472e-05, 'epoch': 0.03} + + 3%|▎ | 205/7378 [42:18<24:30:18, 12.30s/it] + 3%|▎ | 206/7378 [42:30<24:26:09, 12.27s/it] + +{'loss': 0.5346, 'learning_rate': 1.855855855855856e-05, 'epoch': 0.03} + + 3%|▎ | 206/7378 [42:30<24:26:09, 12.27s/it] + 3%|▎ | 207/7378 [42:42<24:27:59, 12.28s/it] + +{'loss': 0.5728, 'learning_rate': 1.864864864864865e-05, 'epoch': 0.03} + + 3%|▎ | 207/7378 [42:42<24:27:59, 12.28s/it] + 3%|▎ | 208/7378 [42:54<24:23:24, 12.25s/it] + +{'loss': 0.5031, 'learning_rate': 1.873873873873874e-05, 'epoch': 0.03} + + 3%|▎ | 208/7378 [42:54<24:23:24, 12.25s/it] + 3%|▎ | 209/7378 [43:07<24:38:33, 12.37s/it] + +{'loss': 0.5679, 'learning_rate': 1.8828828828828832e-05, 'epoch': 0.03} + + 3%|▎ | 209/7378 [43:07<24:38:33, 12.37s/it] + 3%|▎ | 210/7378 [43:19<24:34:42, 12.34s/it] + +{'loss': 0.5881, 'learning_rate': 1.891891891891892e-05, 'epoch': 0.03} + + 3%|▎ | 210/7378 [43:19<24:34:42, 12.34s/it] + 3%|▎ | 211/7378 [43:32<24:27:20, 12.28s/it] + +{'loss': 0.5853, 'learning_rate': 1.900900900900901e-05, 'epoch': 0.03} + + 3%|▎ | 211/7378 [43:32<24:27:20, 12.28s/it] + 3%|▎ | 212/7378 [43:44<24:19:07, 12.22s/it] + +{'loss': 0.5624, 'learning_rate': 1.90990990990991e-05, 'epoch': 0.03} + + 3%|▎ | 212/7378 [43:44<24:19:07, 12.22s/it] + 3%|▎ | 213/7378 [43:56<24:32:04, 12.33s/it] + +{'loss': 0.589, 'learning_rate': 1.918918918918919e-05, 'epoch': 0.03} + + 3%|▎ | 213/7378 [43:56<24:32:04, 12.33s/it] + 3%|▎ | 214/7378 [44:09<24:40:23, 12.40s/it] + +{'loss': 0.522, 'learning_rate': 1.927927927927928e-05, 'epoch': 0.03} + + 3%|▎ | 214/7378 [44:09<24:40:23, 12.40s/it] + 3%|▎ | 215/7378 [44:21<24:36:07, 12.36s/it] + +{'loss': 0.6159, 'learning_rate': 1.936936936936937e-05, 'epoch': 0.03} + + 3%|▎ | 215/7378 [44:21<24:36:07, 12.36s/it] + 3%|▎ | 216/7378 [44:33<24:28:47, 12.30s/it] + +{'loss': 0.5628, 'learning_rate': 1.9459459459459463e-05, 'epoch': 0.03} + + 3%|▎ | 216/7378 [44:33<24:28:47, 12.30s/it] + 3%|▎ | 217/7378 [44:46<24:34:10, 12.35s/it] + +{'loss': 0.5342, 'learning_rate': 1.9549549549549552e-05, 'epoch': 0.03} + + 3%|▎ | 217/7378 [44:46<24:34:10, 12.35s/it] + 3%|▎ | 218/7378 [44:58<24:34:12, 12.35s/it] + +{'loss': 0.5911, 'learning_rate': 1.963963963963964e-05, 'epoch': 0.03} + + 3%|▎ | 218/7378 [44:58<24:34:12, 12.35s/it] + 3%|▎ | 219/7378 [45:10<24:24:43, 12.28s/it] + +{'loss': 0.5204, 'learning_rate': 1.972972972972973e-05, 'epoch': 0.03} + + 3%|▎ | 219/7378 [45:10<24:24:43, 12.28s/it] + 3%|▎ | 220/7378 [45:22<24:23:19, 12.27s/it] + +{'loss': 0.4581, 'learning_rate': 1.981981981981982e-05, 'epoch': 0.03} + + 3%|▎ | 220/7378 [45:22<24:23:19, 12.27s/it] + 3%|▎ | 221/7378 [45:35<24:29:21, 12.32s/it] + +{'loss': 0.5821, 'learning_rate': 1.9909909909909912e-05, 'epoch': 0.03} + + 3%|▎ | 221/7378 [45:35<24:29:21, 12.32s/it] + 3%|▎ | 222/7378 [45:48<24:46:09, 12.46s/it] + +{'loss': 0.547, 'learning_rate': 2e-05, 'epoch': 0.03} + + 3%|▎ | 222/7378 [45:48<24:46:09, 12.46s/it] + 3%|▎ | 223/7378 [46:00<24:43:11, 12.44s/it] + +{'loss': 0.6075, 'learning_rate': 1.999999903632836e-05, 'epoch': 0.03} + + 3%|▎ | 223/7378 [46:00<24:43:11, 12.44s/it] + 3%|▎ | 224/7378 [46:12<24:43:21, 12.44s/it] + +{'loss': 0.5605, 'learning_rate': 1.9999996145313622e-05, 'epoch': 0.03} + + 3%|▎ | 224/7378 [46:12<24:43:21, 12.44s/it] + 3%|▎ | 225/7378 [46:25<24:30:41, 12.34s/it] + +{'loss': 0.5845, 'learning_rate': 1.9999991326956344e-05, 'epoch': 0.03} + + 3%|▎ | 225/7378 [46:25<24:30:41, 12.34s/it] + 3%|▎ | 226/7378 [46:37<24:34:24, 12.37s/it] + +{'loss': 0.5591, 'learning_rate': 1.9999984581257452e-05, 'epoch': 0.03} + + 3%|▎ | 226/7378 [46:37<24:34:24, 12.37s/it] + 3%|▎ | 227/7378 [46:49<24:16:44, 12.22s/it] + +{'loss': 0.5107, 'learning_rate': 1.999997590821825e-05, 'epoch': 0.03} + + 3%|▎ | 227/7378 [46:49<24:16:44, 12.22s/it] + 3%|▎ | 228/7378 [47:01<24:26:52, 12.31s/it] + +{'loss': 0.5784, 'learning_rate': 1.999996530784041e-05, 'epoch': 0.03} + + 3%|▎ | 228/7378 [47:01<24:26:52, 12.31s/it] + 3%|▎ | 229/7378 [47:14<24:28:31, 12.33s/it] + +{'loss': 0.6033, 'learning_rate': 1.999995278012597e-05, 'epoch': 0.03} + + 3%|▎ | 229/7378 [47:14<24:28:31, 12.33s/it] + 3%|▎ | 230/7378 [47:26<24:29:45, 12.34s/it] + +{'loss': 0.6053, 'learning_rate': 1.999993832507735e-05, 'epoch': 0.03} + + 3%|▎ | 230/7378 [47:26<24:29:45, 12.34s/it] + 3%|▎ | 231/7378 [47:38<24:21:02, 12.27s/it] + +{'loss': 0.5573, 'learning_rate': 1.9999921942697335e-05, 'epoch': 0.03} + + 3%|▎ | 231/7378 [47:38<24:21:02, 12.27s/it] + 3%|▎ | 232/7378 [47:50<24:17:15, 12.24s/it] + +{'loss': 0.5644, 'learning_rate': 1.999990363298908e-05, 'epoch': 0.03} + + 3%|▎ | 232/7378 [47:50<24:17:15, 12.24s/it] + 3%|▎ | 233/7378 [48:03<24:23:02, 12.29s/it] + +{'loss': 0.5456, 'learning_rate': 1.9999883395956114e-05, 'epoch': 0.03} + + 3%|▎ | 233/7378 [48:03<24:23:02, 12.29s/it] + 3%|▎ | 234/7378 [48:15<24:11:45, 12.19s/it] + +{'loss': 0.5494, 'learning_rate': 1.999986123160234e-05, 'epoch': 0.03} + + 3%|▎ | 234/7378 [48:15<24:11:45, 12.19s/it] + 3%|▎ | 235/7378 [48:27<24:20:18, 12.27s/it] + +{'loss': 0.5448, 'learning_rate': 1.9999837139932027e-05, 'epoch': 0.03} + + 3%|▎ | 235/7378 [48:27<24:20:18, 12.27s/it] + 3%|▎ | 236/7378 [48:40<24:31:22, 12.36s/it] + +{'loss': 0.553, 'learning_rate': 1.9999811120949818e-05, 'epoch': 0.03} + + 3%|▎ | 236/7378 [48:40<24:31:22, 12.36s/it] + 3%|▎ | 237/7378 [48:52<24:38:38, 12.42s/it] + +{'loss': 0.6188, 'learning_rate': 1.9999783174660733e-05, 'epoch': 0.03} + + 3%|▎ | 237/7378 [48:52<24:38:38, 12.42s/it] + 3%|▎ | 238/7378 [49:05<24:47:24, 12.50s/it] + +{'loss': 0.5879, 'learning_rate': 1.9999753301070156e-05, 'epoch': 0.03} + + 3%|▎ | 238/7378 [49:05<24:47:24, 12.50s/it] + 3%|▎ | 239/7378 [49:17<24:41:11, 12.45s/it] + +{'loss': 0.4985, 'learning_rate': 1.999972150018384e-05, 'epoch': 0.03} + + 3%|▎ | 239/7378 [49:17<24:41:11, 12.45s/it] + 3%|▎ | 240/7378 [49:30<24:36:20, 12.41s/it] + +{'loss': 0.5568, 'learning_rate': 1.9999687772007917e-05, 'epoch': 0.03} + + 3%|▎ | 240/7378 [49:30<24:36:20, 12.41s/it] + 3%|▎ | 241/7378 [49:42<24:41:58, 12.46s/it] + +{'loss': 0.6011, 'learning_rate': 1.999965211654889e-05, 'epoch': 0.03} + + 3%|▎ | 241/7378 [49:42<24:41:58, 12.46s/it] + 3%|▎ | 242/7378 [49:55<24:50:54, 12.54s/it] + +{'loss': 0.5601, 'learning_rate': 1.999961453381363e-05, 'epoch': 0.03} + + 3%|▎ | 242/7378 [49:55<24:50:54, 12.54s/it] + 3%|▎ | 243/7378 [50:08<24:57:48, 12.60s/it] + +{'loss': 0.5012, 'learning_rate': 1.9999575023809377e-05, 'epoch': 0.03} + + 3%|▎ | 243/7378 [50:08<24:57:48, 12.60s/it] + 3%|▎ | 244/7378 [50:20<24:52:45, 12.55s/it] + +{'loss': 0.535, 'learning_rate': 1.999953358654375e-05, 'epoch': 0.03} + + 3%|▎ | 244/7378 [50:20<24:52:45, 12.55s/it] + 3%|▎ | 245/7378 [50:32<24:45:21, 12.49s/it] + +{'loss': 0.5614, 'learning_rate': 1.9999490222024733e-05, 'epoch': 0.03} + + 3%|▎ | 245/7378 [50:32<24:45:21, 12.49s/it] + 3%|▎ | 246/7378 [50:45<24:42:28, 12.47s/it] + +{'loss': 0.5662, 'learning_rate': 1.9999444930260684e-05, 'epoch': 0.03} + + 3%|▎ | 246/7378 [50:45<24:42:28, 12.47s/it] + 3%|▎ | 247/7378 [50:57<24:44:23, 12.49s/it] + +{'loss': 0.5387, 'learning_rate': 1.9999397711260334e-05, 'epoch': 0.03} + + 3%|▎ | 247/7378 [50:57<24:44:23, 12.49s/it] + 3%|▎ | 248/7378 [51:10<24:31:10, 12.38s/it] + +{'loss': 0.5536, 'learning_rate': 1.9999348565032784e-05, 'epoch': 0.03} + + 3%|▎ | 248/7378 [51:10<24:31:10, 12.38s/it] + 3%|▎ | 249/7378 [51:22<24:30:16, 12.37s/it] + +{'loss': 0.4986, 'learning_rate': 1.9999297491587502e-05, 'epoch': 0.03} + + 3%|▎ | 249/7378 [51:22<24:30:16, 12.37s/it] + 3%|▎ | 250/7378 [51:35<24:46:05, 12.51s/it] + +{'loss': 0.5207, 'learning_rate': 1.9999244490934337e-05, 'epoch': 0.03} + + 3%|▎ | 250/7378 [51:35<24:46:05, 12.51s/it] + 3%|▎ | 251/7378 [51:47<24:37:53, 12.44s/it] + +{'loss': 0.5435, 'learning_rate': 1.99991895630835e-05, 'epoch': 0.03} + + 3%|▎ | 251/7378 [51:47<24:37:53, 12.44s/it] + 3%|▎ | 252/7378 [51:59<24:34:02, 12.41s/it] + +{'loss': 0.5987, 'learning_rate': 1.9999132708045578e-05, 'epoch': 0.03} + + 3%|▎ | 252/7378 [51:59<24:34:02, 12.41s/it] + 3%|▎ | 253/7378 [52:11<24:22:57, 12.32s/it] + +{'loss': 0.5389, 'learning_rate': 1.999907392583153e-05, 'epoch': 0.03} + + 3%|▎ | 253/7378 [52:11<24:22:57, 12.32s/it] + 3%|▎ | 254/7378 [52:24<24:13:53, 12.24s/it] + +{'loss': 0.5042, 'learning_rate': 1.9999013216452688e-05, 'epoch': 0.03} + + 3%|▎ | 254/7378 [52:24<24:13:53, 12.24s/it] + 3%|▎ | 255/7378 [52:36<24:13:05, 12.24s/it] + +{'loss': 0.5561, 'learning_rate': 1.9998950579920748e-05, 'epoch': 0.03} + + 3%|▎ | 255/7378 [52:36<24:13:05, 12.24s/it] + 3%|▎ | 256/7378 [52:48<24:16:32, 12.27s/it] + +{'loss': 0.5302, 'learning_rate': 1.9998886016247784e-05, 'epoch': 0.03} + + 3%|▎ | 256/7378 [52:48<24:16:32, 12.27s/it] + 3%|▎ | 257/7378 [53:00<24:14:35, 12.26s/it] + +{'loss': 0.5596, 'learning_rate': 1.999881952544624e-05, 'epoch': 0.03} + + 3%|▎ | 257/7378 [53:00<24:14:35, 12.26s/it] + 3%|▎ | 258/7378 [53:13<24:11:36, 12.23s/it] + +{'loss': 0.5454, 'learning_rate': 1.9998751107528934e-05, 'epoch': 0.03} + + 3%|▎ | 258/7378 [53:13<24:11:36, 12.23s/it] + 4%|▎ | 259/7378 [53:25<24:26:30, 12.36s/it] + +{'loss': 0.6012, 'learning_rate': 1.9998680762509045e-05, 'epoch': 0.04} + + 4%|▎ | 259/7378 [53:25<24:26:30, 12.36s/it] + 4%|▎ | 260/7378 [53:38<24:25:52, 12.36s/it] + +{'loss': 0.577, 'learning_rate': 1.9998608490400137e-05, 'epoch': 0.04} + + 4%|▎ | 260/7378 [53:38<24:25:52, 12.36s/it] + 4%|▎ | 261/7378 [53:49<24:07:29, 12.20s/it] + +{'loss': 0.5476, 'learning_rate': 1.999853429121614e-05, 'epoch': 0.04} + + 4%|▎ | 261/7378 [53:49<24:07:29, 12.20s/it] + 4%|▎ | 262/7378 [54:02<24:17:45, 12.29s/it] + +{'loss': 0.5743, 'learning_rate': 1.999845816497135e-05, 'epoch': 0.04} + + 4%|▎ | 262/7378 [54:02<24:17:45, 12.29s/it] + 4%|▎ | 263/7378 [54:14<24:19:13, 12.31s/it] + +{'loss': 0.6189, 'learning_rate': 1.999838011168044e-05, 'epoch': 0.04} + + 4%|▎ | 263/7378 [54:14<24:19:13, 12.31s/it] + 4%|▎ | 264/7378 [54:26<23:59:04, 12.14s/it] + +{'loss': 0.5363, 'learning_rate': 1.9998300131358457e-05, 'epoch': 0.04} + + 4%|▎ | 264/7378 [54:26<23:59:04, 12.14s/it] + 4%|▎ | 265/7378 [54:39<24:16:29, 12.29s/it] + +{'loss': 0.5759, 'learning_rate': 1.999821822402081e-05, 'epoch': 0.04} + + 4%|▎ | 265/7378 [54:39<24:16:29, 12.29s/it] + 4%|▎ | 266/7378 [54:51<24:18:46, 12.31s/it] + +{'loss': 0.6563, 'learning_rate': 1.9998134389683295e-05, 'epoch': 0.04} + + 4%|▎ | 266/7378 [54:51<24:18:46, 12.31s/it] + 4%|▎ | 267/7378 [55:03<24:06:36, 12.21s/it] + +{'loss': 0.6118, 'learning_rate': 1.9998048628362063e-05, 'epoch': 0.04} + + 4%|▎ | 267/7378 [55:03<24:06:36, 12.21s/it] + 4%|▎ | 268/7378 [55:15<24:13:13, 12.26s/it] + +{'loss': 0.5746, 'learning_rate': 1.9997960940073643e-05, 'epoch': 0.04} + + 4%|▎ | 268/7378 [55:15<24:13:13, 12.26s/it] + 4%|▎ | 269/7378 [55:28<24:22:23, 12.34s/it] + +{'loss': 0.5311, 'learning_rate': 1.9997871324834937e-05, 'epoch': 0.04} + + 4%|▎ | 269/7378 [55:28<24:22:23, 12.34s/it] + 4%|▎ | 270/7378 [55:41<24:33:43, 12.44s/it] + +{'loss': 0.5486, 'learning_rate': 1.9997779782663217e-05, 'epoch': 0.04} + + 4%|▎ | 270/7378 [55:41<24:33:43, 12.44s/it] + 4%|▎ | 271/7378 [55:53<24:22:23, 12.35s/it] + +{'loss': 0.5845, 'learning_rate': 1.9997686313576125e-05, 'epoch': 0.04} + + 4%|▎ | 271/7378 [55:53<24:22:23, 12.35s/it] + 4%|▎ | 272/7378 [56:05<24:10:50, 12.25s/it] + +{'loss': 0.548, 'learning_rate': 1.999759091759168e-05, 'epoch': 0.04} + + 4%|▎ | 272/7378 [56:05<24:10:50, 12.25s/it] + 4%|▎ | 273/7378 [56:17<24:27:47, 12.40s/it] + +{'loss': 0.5867, 'learning_rate': 1.999749359472826e-05, 'epoch': 0.04} + + 4%|▎ | 273/7378 [56:17<24:27:47, 12.40s/it] + 4%|▎ | 274/7378 [56:30<24:36:40, 12.47s/it] + +{'loss': 0.5683, 'learning_rate': 1.999739434500463e-05, 'epoch': 0.04} + + 4%|▎ | 274/7378 [56:30<24:36:40, 12.47s/it] + 4%|▎ | 275/7378 [56:43<24:37:58, 12.48s/it] + +{'loss': 0.5167, 'learning_rate': 1.9997293168439915e-05, 'epoch': 0.04} + + 4%|▎ | 275/7378 [56:43<24:37:58, 12.48s/it] + 4%|▎ | 276/7378 [56:55<24:34:30, 12.46s/it] + +{'loss': 0.5623, 'learning_rate': 1.999719006505362e-05, 'epoch': 0.04} + + 4%|▎ | 276/7378 [56:55<24:34:30, 12.46s/it] + 4%|▍ | 277/7378 [57:08<24:41:45, 12.52s/it] + +{'loss': 0.5517, 'learning_rate': 1.9997085034865605e-05, 'epoch': 0.04} + + 4%|▍ | 277/7378 [57:08<24:41:45, 12.52s/it] + 4%|▍ | 278/7378 [57:20<24:37:09, 12.48s/it] + +{'loss': 0.6517, 'learning_rate': 1.999697807789613e-05, 'epoch': 0.04} + + 4%|▍ | 278/7378 [57:20<24:37:09, 12.48s/it] + 4%|▍ | 279/7378 [57:32<24:17:12, 12.32s/it] + +{'loss': 0.5588, 'learning_rate': 1.9996869194165796e-05, 'epoch': 0.04} + + 4%|▍ | 279/7378 [57:32<24:17:12, 12.32s/it] + 4%|▍ | 280/7378 [57:44<24:06:07, 12.22s/it] + +{'loss': 0.585, 'learning_rate': 1.999675838369559e-05, 'epoch': 0.04} + + 4%|▍ | 280/7378 [57:44<24:06:07, 12.22s/it] + 4%|▍ | 281/7378 [57:56<24:07:11, 12.23s/it] + +{'loss': 0.5453, 'learning_rate': 1.9996645646506876e-05, 'epoch': 0.04} + + 4%|▍ | 281/7378 [57:56<24:07:11, 12.23s/it] + 4%|▍ | 282/7378 [58:09<24:13:19, 12.29s/it] + +{'loss': 0.5033, 'learning_rate': 1.9996530982621376e-05, 'epoch': 0.04} + + 4%|▍ | 282/7378 [58:09<24:13:19, 12.29s/it] + 4%|▍ | 283/7378 [58:21<24:23:54, 12.38s/it] + +{'loss': 0.6125, 'learning_rate': 1.9996414392061192e-05, 'epoch': 0.04} + + 4%|▍ | 283/7378 [58:21<24:23:54, 12.38s/it] + 4%|▍ | 284/7378 [58:33<24:19:05, 12.34s/it] + +{'loss': 0.5361, 'learning_rate': 1.9996295874848794e-05, 'epoch': 0.04} + + 4%|▍ | 284/7378 [58:33<24:19:05, 12.34s/it] + 4%|▍ | 285/7378 [58:45<24:05:14, 12.23s/it] + +{'loss': 0.533, 'learning_rate': 1.9996175431007025e-05, 'epoch': 0.04} + + 4%|▍ | 285/7378 [58:45<24:05:14, 12.23s/it] + 4%|▍ | 286/7378 [58:58<24:02:40, 12.21s/it] + +{'loss': 0.6055, 'learning_rate': 1.99960530605591e-05, 'epoch': 0.04} + + 4%|▍ | 286/7378 [58:58<24:02:40, 12.21s/it] + 4%|▍ | 287/7378 [59:10<24:13:45, 12.30s/it] + +{'loss': 0.5952, 'learning_rate': 1.9995928763528603e-05, 'epoch': 0.04} + + 4%|▍ | 287/7378 [59:10<24:13:45, 12.30s/it] + 4%|▍ | 288/7378 [59:23<24:43:49, 12.56s/it] + +{'loss': 0.5543, 'learning_rate': 1.999580253993949e-05, 'epoch': 0.04} + + 4%|▍ | 288/7378 [59:23<24:43:49, 12.56s/it] + 4%|▍ | 289/7378 [59:36<24:33:48, 12.47s/it] + +{'loss': 0.5184, 'learning_rate': 1.9995674389816087e-05, 'epoch': 0.04} + + 4%|▍ | 289/7378 [59:36<24:33:48, 12.47s/it] + 4%|▍ | 290/7378 [59:48<24:44:24, 12.57s/it] + +{'loss': 0.4994, 'learning_rate': 1.9995544313183096e-05, 'epoch': 0.04} + + 4%|▍ | 290/7378 [59:48<24:44:24, 12.57s/it] + 4%|▍ | 291/7378 [1:00:01<24:45:35, 12.58s/it] + +{'loss': 0.5475, 'learning_rate': 1.9995412310065583e-05, 'epoch': 0.04} + + 4%|▍ | 291/7378 [1:00:01<24:45:35, 12.58s/it] + 4%|▍ | 292/7378 [1:00:13<24:37:44, 12.51s/it] + +{'loss': 0.4989, 'learning_rate': 1.9995278380488994e-05, 'epoch': 0.04} + + 4%|▍ | 292/7378 [1:00:13<24:37:44, 12.51s/it] + 4%|▍ | 293/7378 [1:00:26<24:29:21, 12.44s/it] + +{'loss': 0.5813, 'learning_rate': 1.999514252447914e-05, 'epoch': 0.04} + + 4%|▍ | 293/7378 [1:00:26<24:29:21, 12.44s/it] + 4%|▍ | 294/7378 [1:00:38<24:19:47, 12.36s/it] + +{'loss': 0.5152, 'learning_rate': 1.9995004742062206e-05, 'epoch': 0.04} + + 4%|▍ | 294/7378 [1:00:38<24:19:47, 12.36s/it] + 4%|▍ | 295/7378 [1:00:50<24:16:38, 12.34s/it] + +{'loss': 0.5687, 'learning_rate': 1.9994865033264744e-05, 'epoch': 0.04} + + 4%|▍ | 295/7378 [1:00:50<24:16:38, 12.34s/it] + 4%|▍ | 296/7378 [1:01:03<24:22:54, 12.39s/it] + +{'loss': 0.6138, 'learning_rate': 1.9994723398113688e-05, 'epoch': 0.04} + + 4%|▍ | 296/7378 [1:01:03<24:22:54, 12.39s/it] + 4%|▍ | 297/7378 [1:01:15<24:16:02, 12.34s/it] + +{'loss': 0.5558, 'learning_rate': 1.999457983663633e-05, 'epoch': 0.04} + + 4%|▍ | 297/7378 [1:01:15<24:16:02, 12.34s/it] + 4%|▍ | 298/7378 [1:01:27<24:08:08, 12.27s/it] + +{'loss': 0.5506, 'learning_rate': 1.9994434348860337e-05, 'epoch': 0.04} + + 4%|▍ | 298/7378 [1:01:27<24:08:08, 12.27s/it] + 4%|▍ | 299/7378 [1:01:40<24:25:57, 12.43s/it] + +{'loss': 0.5601, 'learning_rate': 1.9994286934813754e-05, 'epoch': 0.04} + + 4%|▍ | 299/7378 [1:01:40<24:25:57, 12.43s/it] + 4%|▍ | 300/7378 [1:01:52<24:23:48, 12.41s/it] + +{'loss': 0.5559, 'learning_rate': 1.9994137594524992e-05, 'epoch': 0.04} + + 4%|▍ | 300/7378 [1:01:52<24:23:48, 12.41s/it] + 4%|▍ | 301/7378 [1:02:04<24:22:25, 12.40s/it] + +{'loss': 0.5417, 'learning_rate': 1.999398632802284e-05, 'epoch': 0.04} + + 4%|▍ | 301/7378 [1:02:04<24:22:25, 12.40s/it] + 4%|▍ | 302/7378 [1:02:16<23:59:28, 12.21s/it] + +{'loss': 0.5251, 'learning_rate': 1.999383313533644e-05, 'epoch': 0.04} + + 4%|▍ | 302/7378 [1:02:16<23:59:28, 12.21s/it] + 4%|▍ | 303/7378 [1:02:29<24:08:06, 12.28s/it] + +{'loss': 0.5427, 'learning_rate': 1.999367801649532e-05, 'epoch': 0.04} + + 4%|▍ | 303/7378 [1:02:29<24:08:06, 12.28s/it] + 4%|▍ | 304/7378 [1:02:41<24:11:15, 12.31s/it] + +{'loss': 0.5375, 'learning_rate': 1.9993520971529388e-05, 'epoch': 0.04} + + 4%|▍ | 304/7378 [1:02:41<24:11:15, 12.31s/it] + 4%|▍ | 305/7378 [1:02:53<24:16:43, 12.36s/it] + +{'loss': 0.5339, 'learning_rate': 1.9993362000468897e-05, 'epoch': 0.04} + + 4%|▍ | 305/7378 [1:02:53<24:16:43, 12.36s/it] + 4%|▍ | 306/7378 [1:03:06<24:20:47, 12.39s/it] + +{'loss': 0.5086, 'learning_rate': 1.99932011033445e-05, 'epoch': 0.04} + + 4%|▍ | 306/7378 [1:03:06<24:20:47, 12.39s/it] + 4%|▍ | 307/7378 [1:03:18<24:15:04, 12.35s/it] + +{'loss': 0.5341, 'learning_rate': 1.9993038280187197e-05, 'epoch': 0.04} + + 4%|▍ | 307/7378 [1:03:18<24:15:04, 12.35s/it] + 4%|▍ | 308/7378 [1:03:30<24:02:36, 12.24s/it] + +{'loss': 0.5445, 'learning_rate': 1.9992873531028372e-05, 'epoch': 0.04} + + 4%|▍ | 308/7378 [1:03:30<24:02:36, 12.24s/it] + 4%|▍ | 309/7378 [1:03:43<24:11:10, 12.32s/it] + +{'loss': 0.4746, 'learning_rate': 1.9992706855899785e-05, 'epoch': 0.04} + + 4%|▍ | 309/7378 [1:03:43<24:11:10, 12.32s/it] + 4%|▍ | 310/7378 [1:03:55<24:01:17, 12.24s/it] + +{'loss': 0.5722, 'learning_rate': 1.9992538254833548e-05, 'epoch': 0.04} + + 4%|▍ | 310/7378 [1:03:55<24:01:17, 12.24s/it] + 4%|▍ | 311/7378 [1:04:08<24:22:34, 12.42s/it] + +{'loss': 0.5072, 'learning_rate': 1.9992367727862166e-05, 'epoch': 0.04} + + 4%|▍ | 311/7378 [1:04:08<24:22:34, 12.42s/it] + 4%|▍ | 312/7378 [1:04:20<24:20:37, 12.40s/it] + +{'loss': 0.5056, 'learning_rate': 1.99921952750185e-05, 'epoch': 0.04} + + 4%|▍ | 312/7378 [1:04:20<24:20:37, 12.40s/it] + 4%|▍ | 313/7378 [1:04:32<24:15:51, 12.36s/it] + +{'loss': 0.6061, 'learning_rate': 1.999202089633579e-05, 'epoch': 0.04} + + 4%|▍ | 313/7378 [1:04:32<24:15:51, 12.36s/it] + 4%|▍ | 314/7378 [1:04:44<24:04:01, 12.27s/it] + +{'loss': 0.5923, 'learning_rate': 1.9991844591847644e-05, 'epoch': 0.04} + + 4%|▍ | 314/7378 [1:04:44<24:04:01, 12.27s/it] + 4%|▍ | 315/7378 [1:04:57<24:04:05, 12.27s/it] + +{'loss': 0.5471, 'learning_rate': 1.9991666361588042e-05, 'epoch': 0.04} + + 4%|▍ | 315/7378 [1:04:57<24:04:05, 12.27s/it] + 4%|▍ | 316/7378 [1:05:09<24:11:15, 12.33s/it] + +{'loss': 0.5044, 'learning_rate': 1.9991486205591334e-05, 'epoch': 0.04} + + 4%|▍ | 316/7378 [1:05:09<24:11:15, 12.33s/it] + 4%|▍ | 317/7378 [1:05:21<24:09:14, 12.31s/it] + +{'loss': 0.4451, 'learning_rate': 1.9991304123892243e-05, 'epoch': 0.04} + + 4%|▍ | 317/7378 [1:05:21<24:09:14, 12.31s/it] + 4%|▍ | 318/7378 [1:05:34<24:14:35, 12.36s/it] + +{'loss': 0.5344, 'learning_rate': 1.9991120116525866e-05, 'epoch': 0.04} + + 4%|▍ | 318/7378 [1:05:34<24:14:35, 12.36s/it] + 4%|▍ | 319/7378 [1:05:46<24:10:12, 12.33s/it] + +{'loss': 0.5574, 'learning_rate': 1.999093418352766e-05, 'epoch': 0.04} + + 4%|▍ | 319/7378 [1:05:46<24:10:12, 12.33s/it] + 4%|▍ | 320/7378 [1:05:58<24:15:19, 12.37s/it] + +{'loss': 0.5284, 'learning_rate': 1.999074632493347e-05, 'epoch': 0.04} + + 4%|▍ | 320/7378 [1:05:58<24:15:19, 12.37s/it] + 4%|▍ | 321/7378 [1:06:10<23:52:53, 12.18s/it] + +{'loss': 0.5405, 'learning_rate': 1.9990556540779496e-05, 'epoch': 0.04} + + 4%|▍ | 321/7378 [1:06:10<23:52:53, 12.18s/it] + 4%|▍ | 322/7378 [1:06:23<24:10:31, 12.33s/it] + +{'loss': 0.4906, 'learning_rate': 1.9990364831102317e-05, 'epoch': 0.04} + + 4%|▍ | 322/7378 [1:06:23<24:10:31, 12.33s/it] + 4%|▍ | 323/7378 [1:06:35<23:59:41, 12.24s/it] + +{'loss': 0.4786, 'learning_rate': 1.9990171195938885e-05, 'epoch': 0.04} + + 4%|▍ | 323/7378 [1:06:35<23:59:41, 12.24s/it] + 4%|▍ | 324/7378 [1:06:47<24:01:06, 12.26s/it] + +{'loss': 0.6138, 'learning_rate': 1.9989975635326517e-05, 'epoch': 0.04} + + 4%|▍ | 324/7378 [1:06:47<24:01:06, 12.26s/it] + 4%|▍ | 325/7378 [1:06:59<23:55:32, 12.21s/it] + +{'loss': 0.4869, 'learning_rate': 1.9989778149302902e-05, 'epoch': 0.04} + + 4%|▍ | 325/7378 [1:06:59<23:55:32, 12.21s/it] + 4%|▍ | 326/7378 [1:07:12<24:00:45, 12.26s/it] + +{'loss': 0.497, 'learning_rate': 1.9989578737906107e-05, 'epoch': 0.04} + + 4%|▍ | 326/7378 [1:07:12<24:00:45, 12.26s/it] + 4%|▍ | 327/7378 [1:07:24<24:01:34, 12.27s/it] + +{'loss': 0.5364, 'learning_rate': 1.9989377401174566e-05, 'epoch': 0.04} + + 4%|▍ | 327/7378 [1:07:24<24:01:34, 12.27s/it] + 4%|▍ | 328/7378 [1:07:36<24:04:10, 12.29s/it] + +{'loss': 0.5628, 'learning_rate': 1.998917413914708e-05, 'epoch': 0.04} + + 4%|▍ | 328/7378 [1:07:36<24:04:10, 12.29s/it] + 4%|▍ | 329/7378 [1:07:48<23:57:35, 12.24s/it] + +{'loss': 0.5112, 'learning_rate': 1.9988968951862823e-05, 'epoch': 0.04} + + 4%|▍ | 329/7378 [1:07:48<23:57:35, 12.24s/it] + 4%|▍ | 330/7378 [1:08:01<24:17:37, 12.41s/it] + +{'loss': 0.5758, 'learning_rate': 1.9988761839361347e-05, 'epoch': 0.04} + + 4%|▍ | 330/7378 [1:08:01<24:17:37, 12.41s/it] + 4%|▍ | 331/7378 [1:08:14<24:33:52, 12.55s/it] + +{'loss': 0.5242, 'learning_rate': 1.9988552801682572e-05, 'epoch': 0.04} + + 4%|▍ | 331/7378 [1:08:14<24:33:52, 12.55s/it] + 4%|▍ | 332/7378 [1:08:27<24:31:09, 12.53s/it] + +{'loss': 0.5231, 'learning_rate': 1.9988341838866772e-05, 'epoch': 0.04} + + 4%|▍ | 332/7378 [1:08:27<24:31:09, 12.53s/it] + 5%|▍ | 333/7378 [1:08:39<24:14:16, 12.39s/it] + +{'loss': 0.562, 'learning_rate': 1.9988128950954623e-05, 'epoch': 0.05} + + 5%|▍ | 333/7378 [1:08:39<24:14:16, 12.39s/it] + 5%|▍ | 334/7378 [1:08:51<24:16:06, 12.40s/it] + +{'loss': 0.5598, 'learning_rate': 1.9987914137987153e-05, 'epoch': 0.05} + + 5%|▍ | 334/7378 [1:08:51<24:16:06, 12.40s/it][2025-01-23 01:32:02,811] [WARNING] [stage3.py:2069:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time + + 5%|▍ | 335/7378 [1:09:05<25:14:25, 12.90s/it] + +{'loss': 0.526, 'learning_rate': 1.9987697400005753e-05, 'epoch': 0.05} + + 5%|▍ | 335/7378 [1:09:05<25:14:25, 12.90s/it] + 5%|▍ | 336/7378 [1:09:18<25:07:38, 12.85s/it] + +{'loss': 0.5908, 'learning_rate': 1.998747873705221e-05, 'epoch': 0.05} + + 5%|▍ | 336/7378 [1:09:18<25:07:38, 12.85s/it] + 5%|▍ | 337/7378 [1:09:30<24:51:52, 12.71s/it] + +{'loss': 0.4979, 'learning_rate': 1.9987258149168654e-05, 'epoch': 0.05} + + 5%|▍ | 337/7378 [1:09:30<24:51:52, 12.71s/it] + 5%|▍ | 338/7378 [1:09:43<24:42:49, 12.64s/it] + +{'loss': 0.6002, 'learning_rate': 1.9987035636397615e-05, 'epoch': 0.05} + + 5%|▍ | 338/7378 [1:09:43<24:42:49, 12.64s/it] + 5%|▍ | 339/7378 [1:09:55<24:15:38, 12.41s/it] + +{'loss': 0.5011, 'learning_rate': 1.9986811198781966e-05, 'epoch': 0.05} + + 5%|▍ | 339/7378 [1:09:55<24:15:38, 12.41s/it] + 5%|▍ | 340/7378 [1:10:07<24:11:35, 12.38s/it] + +{'loss': 0.5658, 'learning_rate': 1.998658483636497e-05, 'epoch': 0.05} + + 5%|▍ | 340/7378 [1:10:07<24:11:35, 12.38s/it] + 5%|▍ | 341/7378 [1:10:19<24:06:44, 12.34s/it] + +{'loss': 0.5802, 'learning_rate': 1.9986356549190253e-05, 'epoch': 0.05} + + 5%|▍ | 341/7378 [1:10:19<24:06:44, 12.34s/it] + 5%|▍ | 342/7378 [1:10:32<24:07:35, 12.34s/it] + +{'loss': 0.5583, 'learning_rate': 1.9986126337301814e-05, 'epoch': 0.05} + + 5%|▍ | 342/7378 [1:10:32<24:07:35, 12.34s/it] + 5%|▍ | 343/7378 [1:10:44<24:01:13, 12.29s/it] + +{'loss': 0.4685, 'learning_rate': 1.998589420074402e-05, 'epoch': 0.05} + + 5%|▍ | 343/7378 [1:10:44<24:01:13, 12.29s/it] + 5%|▍ | 344/7378 [1:10:57<24:22:59, 12.48s/it] + +{'loss': 0.5337, 'learning_rate': 1.998566013956162e-05, 'epoch': 0.05} + + 5%|▍ | 344/7378 [1:10:57<24:22:59, 12.48s/it] + 5%|▍ | 345/7378 [1:11:09<24:30:22, 12.54s/it] + +{'loss': 0.5172, 'learning_rate': 1.998542415379972e-05, 'epoch': 0.05} + + 5%|▍ | 345/7378 [1:11:09<24:30:22, 12.54s/it] + 5%|▍ | 346/7378 [1:11:21<24:11:18, 12.38s/it] + +{'loss': 0.5546, 'learning_rate': 1.99851862435038e-05, 'epoch': 0.05} + + 5%|▍ | 346/7378 [1:11:21<24:11:18, 12.38s/it] + 5%|▍ | 347/7378 [1:11:33<23:53:34, 12.23s/it] + +{'loss': 0.5371, 'learning_rate': 1.9984946408719718e-05, 'epoch': 0.05} + + 5%|▍ | 347/7378 [1:11:33<23:53:34, 12.23s/it] + 5%|▍ | 348/7378 [1:11:46<23:59:38, 12.29s/it] + +{'loss': 0.4944, 'learning_rate': 1.9984704649493696e-05, 'epoch': 0.05} + + 5%|▍ | 348/7378 [1:11:46<23:59:38, 12.29s/it] + 5%|▍ | 349/7378 [1:11:58<24:13:17, 12.41s/it] + +{'loss': 0.4969, 'learning_rate': 1.998446096587233e-05, 'epoch': 0.05} + + 5%|▍ | 349/7378 [1:11:58<24:13:17, 12.41s/it] + 5%|▍ | 350/7378 [1:12:10<24:02:27, 12.31s/it] + +{'loss': 0.5378, 'learning_rate': 1.9984215357902586e-05, 'epoch': 0.05} + + 5%|▍ | 350/7378 [1:12:10<24:02:27, 12.31s/it] + 5%|▍ | 351/7378 [1:12:22<23:46:53, 12.18s/it] + +{'loss': 0.5602, 'learning_rate': 1.9983967825631803e-05, 'epoch': 0.05} + + 5%|▍ | 351/7378 [1:12:22<23:46:53, 12.18s/it] + 5%|▍ | 352/7378 [1:12:35<23:53:32, 12.24s/it] + +{'loss': 0.4578, 'learning_rate': 1.9983718369107684e-05, 'epoch': 0.05} + + 5%|▍ | 352/7378 [1:12:35<23:53:32, 12.24s/it] + 5%|▍ | 353/7378 [1:12:47<24:07:52, 12.37s/it] + +{'loss': 0.4888, 'learning_rate': 1.9983466988378314e-05, 'epoch': 0.05} + + 5%|▍ | 353/7378 [1:12:47<24:07:52, 12.37s/it] + 5%|▍ | 354/7378 [1:12:59<23:54:08, 12.25s/it] + +{'loss': 0.471, 'learning_rate': 1.9983213683492143e-05, 'epoch': 0.05} + + 5%|▍ | 354/7378 [1:12:59<23:54:08, 12.25s/it] + 5%|▍ | 355/7378 [1:13:11<23:42:53, 12.16s/it] + +{'loss': 0.5458, 'learning_rate': 1.9982958454497984e-05, 'epoch': 0.05} + + 5%|▍ | 355/7378 [1:13:11<23:42:53, 12.16s/it] + 5%|▍ | 356/7378 [1:13:24<23:50:07, 12.22s/it] + +{'loss': 0.5111, 'learning_rate': 1.9982701301445033e-05, 'epoch': 0.05} + + 5%|▍ | 356/7378 [1:13:24<23:50:07, 12.22s/it] + 5%|▍ | 357/7378 [1:13:36<24:12:13, 12.41s/it] + +{'loss': 0.5206, 'learning_rate': 1.998244222438285e-05, 'epoch': 0.05} + + 5%|▍ | 357/7378 [1:13:36<24:12:13, 12.41s/it] + 5%|▍ | 358/7378 [1:13:49<24:02:08, 12.33s/it] + +{'loss': 0.5262, 'learning_rate': 1.9982181223361373e-05, 'epoch': 0.05} + + 5%|▍ | 358/7378 [1:13:49<24:02:08, 12.33s/it] + 5%|▍ | 359/7378 [1:14:01<24:14:59, 12.44s/it] + +{'loss': 0.5476, 'learning_rate': 1.9981918298430905e-05, 'epoch': 0.05} + + 5%|▍ | 359/7378 [1:14:01<24:14:59, 12.44s/it] + 5%|▍ | 360/7378 [1:14:14<24:10:47, 12.40s/it] + +{'loss': 0.4988, 'learning_rate': 1.9981653449642114e-05, 'epoch': 0.05} + + 5%|▍ | 360/7378 [1:14:14<24:10:47, 12.40s/it] + 5%|▍ | 361/7378 [1:14:26<24:00:07, 12.31s/it] + +{'loss': 0.5293, 'learning_rate': 1.9981386677046052e-05, 'epoch': 0.05} + + 5%|▍ | 361/7378 [1:14:26<24:00:07, 12.31s/it] + 5%|▍ | 362/7378 [1:14:38<24:09:57, 12.40s/it] + +{'loss': 0.5404, 'learning_rate': 1.9981117980694137e-05, 'epoch': 0.05} + + 5%|▍ | 362/7378 [1:14:38<24:09:57, 12.40s/it] + 5%|▍ | 363/7378 [1:14:51<24:13:31, 12.43s/it] + +{'loss': 0.4862, 'learning_rate': 1.9980847360638144e-05, 'epoch': 0.05} + + 5%|▍ | 363/7378 [1:14:51<24:13:31, 12.43s/it] + 5%|▍ | 364/7378 [1:15:03<24:00:13, 12.32s/it] + +{'loss': 0.5617, 'learning_rate': 1.9980574816930245e-05, 'epoch': 0.05} + + 5%|▍ | 364/7378 [1:15:03<24:00:13, 12.32s/it] + 5%|▍ | 365/7378 [1:15:15<24:03:38, 12.35s/it] + +{'loss': 0.4999, 'learning_rate': 1.998030034962296e-05, 'epoch': 0.05} + + 5%|▍ | 365/7378 [1:15:15<24:03:38, 12.35s/it] + 5%|▍ | 366/7378 [1:15:28<24:06:19, 12.38s/it] + +{'loss': 0.5506, 'learning_rate': 1.9980023958769195e-05, 'epoch': 0.05} + + 5%|▍ | 366/7378 [1:15:28<24:06:19, 12.38s/it] + 5%|▍ | 367/7378 [1:15:40<24:06:41, 12.38s/it] + +{'loss': 0.5249, 'learning_rate': 1.9979745644422213e-05, 'epoch': 0.05} + + 5%|▍ | 367/7378 [1:15:40<24:06:41, 12.38s/it] + 5%|▍ | 368/7378 [1:15:52<23:56:26, 12.29s/it] + +{'loss': 0.5182, 'learning_rate': 1.9979465406635654e-05, 'epoch': 0.05} + + 5%|▍ | 368/7378 [1:15:52<23:56:26, 12.29s/it] + 5%|▌ | 369/7378 [1:16:04<23:45:00, 12.20s/it] + +{'loss': 0.5124, 'learning_rate': 1.9979183245463538e-05, 'epoch': 0.05} + + 5%|▌ | 369/7378 [1:16:04<23:45:00, 12.20s/it] + 5%|▌ | 370/7378 [1:16:17<24:01:07, 12.34s/it] + +{'loss': 0.5424, 'learning_rate': 1.9978899160960238e-05, 'epoch': 0.05} + + 5%|▌ | 370/7378 [1:16:17<24:01:07, 12.34s/it] + 5%|▌ | 371/7378 [1:16:29<23:57:40, 12.31s/it] + +{'loss': 0.5628, 'learning_rate': 1.9978613153180516e-05, 'epoch': 0.05} + + 5%|▌ | 371/7378 [1:16:29<23:57:40, 12.31s/it] + 5%|▌ | 372/7378 [1:16:42<24:02:14, 12.35s/it] + +{'loss': 0.5394, 'learning_rate': 1.997832522217949e-05, 'epoch': 0.05} + + 5%|▌ | 372/7378 [1:16:42<24:02:14, 12.35s/it] + 5%|▌ | 373/7378 [1:16:54<23:57:34, 12.31s/it] + +{'loss': 0.4878, 'learning_rate': 1.997803536801265e-05, 'epoch': 0.05} + + 5%|▌ | 373/7378 [1:16:54<23:57:34, 12.31s/it] + 5%|▌ | 374/7378 [1:17:06<24:05:02, 12.38s/it] + +{'loss': 0.5135, 'learning_rate': 1.9977743590735866e-05, 'epoch': 0.05} + + 5%|▌ | 374/7378 [1:17:06<24:05:02, 12.38s/it] + 5%|▌ | 375/7378 [1:17:19<24:13:25, 12.45s/it] + +{'loss': 0.5348, 'learning_rate': 1.9977449890405378e-05, 'epoch': 0.05} + + 5%|▌ | 375/7378 [1:17:19<24:13:25, 12.45s/it] + 5%|▌ | 376/7378 [1:17:31<24:07:50, 12.41s/it] + +{'loss': 0.5535, 'learning_rate': 1.9977154267077786e-05, 'epoch': 0.05} + + 5%|▌ | 376/7378 [1:17:31<24:07:50, 12.41s/it] + 5%|▌ | 377/7378 [1:17:43<23:56:26, 12.31s/it] + +{'loss': 0.5163, 'learning_rate': 1.9976856720810064e-05, 'epoch': 0.05} + + 5%|▌ | 377/7378 [1:17:43<23:56:26, 12.31s/it] + 5%|▌ | 378/7378 [1:17:56<24:04:11, 12.38s/it] + +{'loss': 0.5183, 'learning_rate': 1.9976557251659567e-05, 'epoch': 0.05} + + 5%|▌ | 378/7378 [1:17:56<24:04:11, 12.38s/it] + 5%|▌ | 379/7378 [1:18:08<23:51:40, 12.27s/it] + +{'loss': 0.5113, 'learning_rate': 1.997625585968401e-05, 'epoch': 0.05} + + 5%|▌ | 379/7378 [1:18:08<23:51:40, 12.27s/it] + 5%|▌ | 380/7378 [1:18:20<23:44:40, 12.21s/it] + +{'loss': 0.5556, 'learning_rate': 1.9975952544941478e-05, 'epoch': 0.05} + + 5%|▌ | 380/7378 [1:18:20<23:44:40, 12.21s/it] + 5%|▌ | 381/7378 [1:18:32<23:47:03, 12.24s/it] + +{'loss': 0.5203, 'learning_rate': 1.9975647307490433e-05, 'epoch': 0.05} + + 5%|▌ | 381/7378 [1:18:32<23:47:03, 12.24s/it] + 5%|▌ | 382/7378 [1:18:45<23:52:34, 12.29s/it] + +{'loss': 0.5157, 'learning_rate': 1.9975340147389707e-05, 'epoch': 0.05} + + 5%|▌ | 382/7378 [1:18:45<23:52:34, 12.29s/it] + 5%|▌ | 383/7378 [1:18:57<24:00:05, 12.35s/it] + +{'loss': 0.5955, 'learning_rate': 1.9975031064698497e-05, 'epoch': 0.05} + + 5%|▌ | 383/7378 [1:18:57<24:00:05, 12.35s/it] + 5%|▌ | 384/7378 [1:19:10<24:03:08, 12.38s/it] + +{'loss': 0.5158, 'learning_rate': 1.9974720059476375e-05, 'epoch': 0.05} + + 5%|▌ | 384/7378 [1:19:10<24:03:08, 12.38s/it] + 5%|▌ | 385/7378 [1:19:22<23:53:49, 12.30s/it] + +{'loss': 0.5041, 'learning_rate': 1.997440713178328e-05, 'epoch': 0.05} + + 5%|▌ | 385/7378 [1:19:22<23:53:49, 12.30s/it] + 5%|▌ | 386/7378 [1:19:34<23:54:47, 12.31s/it] + +{'loss': 0.5769, 'learning_rate': 1.997409228167953e-05, 'epoch': 0.05} + + 5%|▌ | 386/7378 [1:19:34<23:54:47, 12.31s/it] + 5%|▌ | 387/7378 [1:19:46<23:37:18, 12.16s/it] + +{'loss': 0.5113, 'learning_rate': 1.99737755092258e-05, 'epoch': 0.05} + + 5%|▌ | 387/7378 [1:19:46<23:37:18, 12.16s/it] + 5%|▌ | 388/7378 [1:19:58<23:40:25, 12.19s/it] + +{'loss': 0.4844, 'learning_rate': 1.997345681448315e-05, 'epoch': 0.05} + + 5%|▌ | 388/7378 [1:19:58<23:40:25, 12.19s/it] + 5%|▌ | 389/7378 [1:20:10<23:38:56, 12.18s/it] + +{'loss': 0.5814, 'learning_rate': 1.9973136197512998e-05, 'epoch': 0.05} + + 5%|▌ | 389/7378 [1:20:10<23:38:56, 12.18s/it] + 5%|▌ | 390/7378 [1:20:23<23:43:09, 12.22s/it] + +{'loss': 0.4944, 'learning_rate': 1.997281365837714e-05, 'epoch': 0.05} + + 5%|▌ | 390/7378 [1:20:23<23:43:09, 12.22s/it] + 5%|▌ | 391/7378 [1:20:35<23:38:09, 12.18s/it] + +{'loss': 0.4887, 'learning_rate': 1.9972489197137742e-05, 'epoch': 0.05} + + 5%|▌ | 391/7378 [1:20:35<23:38:09, 12.18s/it] + 5%|▌ | 392/7378 [1:20:47<23:40:56, 12.20s/it] + +{'loss': 0.5334, 'learning_rate': 1.9972162813857334e-05, 'epoch': 0.05} + + 5%|▌ | 392/7378 [1:20:47<23:40:56, 12.20s/it] + 5%|▌ | 393/7378 [1:20:59<23:39:38, 12.19s/it] + +{'loss': 0.5482, 'learning_rate': 1.9971834508598826e-05, 'epoch': 0.05} + + 5%|▌ | 393/7378 [1:20:59<23:39:38, 12.19s/it] + 5%|▌ | 394/7378 [1:21:11<23:41:13, 12.21s/it] + +{'loss': 0.5629, 'learning_rate': 1.997150428142549e-05, 'epoch': 0.05} + + 5%|▌ | 394/7378 [1:21:11<23:41:13, 12.21s/it] + 5%|▌ | 395/7378 [1:21:24<23:43:57, 12.24s/it] + +{'loss': 0.5561, 'learning_rate': 1.9971172132400977e-05, 'epoch': 0.05} + + 5%|▌ | 395/7378 [1:21:24<23:43:57, 12.24s/it] + 5%|▌ | 396/7378 [1:21:36<23:44:08, 12.24s/it] + +{'loss': 0.4972, 'learning_rate': 1.99708380615893e-05, 'epoch': 0.05} + + 5%|▌ | 396/7378 [1:21:36<23:44:08, 12.24s/it] + 5%|▌ | 397/7378 [1:21:48<23:49:46, 12.29s/it] + +{'loss': 0.5631, 'learning_rate': 1.9970502069054846e-05, 'epoch': 0.05} + + 5%|▌ | 397/7378 [1:21:48<23:49:46, 12.29s/it] + 5%|▌ | 398/7378 [1:22:01<24:00:05, 12.38s/it] + +{'loss': 0.541, 'learning_rate': 1.9970164154862375e-05, 'epoch': 0.05} + + 5%|▌ | 398/7378 [1:22:01<24:00:05, 12.38s/it] + 5%|▌ | 399/7378 [1:22:13<23:48:29, 12.28s/it] + +{'loss': 0.4783, 'learning_rate': 1.996982431907701e-05, 'epoch': 0.05} + + 5%|▌ | 399/7378 [1:22:13<23:48:29, 12.28s/it] + 5%|▌ | 400/7378 [1:22:26<24:01:12, 12.39s/it] + +{'loss': 0.5298, 'learning_rate': 1.996948256176425e-05, 'epoch': 0.05} + + 5%|▌ | 400/7378 [1:22:26<24:01:12, 12.39s/it] + 5%|▌ | 401/7378 [1:22:38<24:04:28, 12.42s/it] + +{'loss': 0.5585, 'learning_rate': 1.996913888298997e-05, 'epoch': 0.05} + + 5%|▌ | 401/7378 [1:22:38<24:04:28, 12.42s/it] + 5%|▌ | 402/7378 [1:22:50<23:53:36, 12.33s/it] + +{'loss': 0.5, 'learning_rate': 1.99687932828204e-05, 'epoch': 0.05} + + 5%|▌ | 402/7378 [1:22:50<23:53:36, 12.33s/it] + 5%|▌ | 403/7378 [1:23:03<24:16:37, 12.53s/it] + +{'loss': 0.6663, 'learning_rate': 1.9968445761322154e-05, 'epoch': 0.05} + + 5%|▌ | 403/7378 [1:23:03<24:16:37, 12.53s/it] + 5%|▌ | 404/7378 [1:23:16<24:22:10, 12.58s/it] + +{'loss': 0.5903, 'learning_rate': 1.996809631856221e-05, 'epoch': 0.05} + + 5%|▌ | 404/7378 [1:23:16<24:22:10, 12.58s/it] + 5%|▌ | 405/7378 [1:23:28<24:15:53, 12.53s/it] + +{'loss': 0.5398, 'learning_rate': 1.9967744954607916e-05, 'epoch': 0.05} + + 5%|▌ | 405/7378 [1:23:28<24:15:53, 12.53s/it] + 6%|▌ | 406/7378 [1:23:41<24:11:23, 12.49s/it] + +{'loss': 0.5336, 'learning_rate': 1.9967391669526995e-05, 'epoch': 0.06} + + 6%|▌ | 406/7378 [1:23:41<24:11:23, 12.49s/it] + 6%|▌ | 407/7378 [1:23:53<24:01:05, 12.40s/it] + +{'loss': 0.5274, 'learning_rate': 1.9967036463387533e-05, 'epoch': 0.06} + + 6%|▌ | 407/7378 [1:23:53<24:01:05, 12.40s/it] + 6%|▌ | 408/7378 [1:24:05<24:05:59, 12.45s/it] + +{'loss': 0.5355, 'learning_rate': 1.9966679336257995e-05, 'epoch': 0.06} + + 6%|▌ | 408/7378 [1:24:05<24:05:59, 12.45s/it] + 6%|▌ | 409/7378 [1:24:18<24:13:45, 12.52s/it] + +{'loss': 0.6343, 'learning_rate': 1.996632028820721e-05, 'epoch': 0.06} + + 6%|▌ | 409/7378 [1:24:18<24:13:45, 12.52s/it] + 6%|▌ | 410/7378 [1:24:31<24:10:16, 12.49s/it] + +{'loss': 0.5064, 'learning_rate': 1.996595931930438e-05, 'epoch': 0.06} + + 6%|▌ | 410/7378 [1:24:31<24:10:16, 12.49s/it] + 6%|▌ | 411/7378 [1:24:43<24:08:45, 12.48s/it] + +{'loss': 0.5352, 'learning_rate': 1.996559642961907e-05, 'epoch': 0.06} + + 6%|▌ | 411/7378 [1:24:43<24:08:45, 12.48s/it] + 6%|▌ | 412/7378 [1:24:56<24:15:26, 12.54s/it] + +{'loss': 0.5615, 'learning_rate': 1.9965231619221232e-05, 'epoch': 0.06} + + 6%|▌ | 412/7378 [1:24:56<24:15:26, 12.54s/it] + 6%|▌ | 413/7378 [1:25:08<24:09:17, 12.48s/it] + +{'loss': 0.5686, 'learning_rate': 1.9964864888181168e-05, 'epoch': 0.06} + + 6%|▌ | 413/7378 [1:25:08<24:09:17, 12.48s/it] + 6%|▌ | 414/7378 [1:25:21<24:23:59, 12.61s/it] + +{'loss': 0.623, 'learning_rate': 1.996449623656956e-05, 'epoch': 0.06} + + 6%|▌ | 414/7378 [1:25:21<24:23:59, 12.61s/it] + 6%|▌ | 415/7378 [1:25:34<24:22:05, 12.60s/it] + +{'loss': 0.557, 'learning_rate': 1.996412566445747e-05, 'epoch': 0.06} + + 6%|▌ | 415/7378 [1:25:34<24:22:05, 12.60s/it] + 6%|▌ | 416/7378 [1:25:46<24:11:40, 12.51s/it] + +{'loss': 0.5277, 'learning_rate': 1.996375317191631e-05, 'epoch': 0.06} + + 6%|▌ | 416/7378 [1:25:46<24:11:40, 12.51s/it] + 6%|▌ | 417/7378 [1:25:58<23:59:18, 12.41s/it] + +{'loss': 0.5475, 'learning_rate': 1.996337875901787e-05, 'epoch': 0.06} + + 6%|▌ | 417/7378 [1:25:58<23:59:18, 12.41s/it] + 6%|▌ | 418/7378 [1:26:11<24:09:38, 12.50s/it] + +{'loss': 0.5453, 'learning_rate': 1.9963002425834322e-05, 'epoch': 0.06} + + 6%|▌ | 418/7378 [1:26:11<24:09:38, 12.50s/it] + 6%|▌ | 419/7378 [1:26:23<23:52:13, 12.35s/it] + +{'loss': 0.5569, 'learning_rate': 1.9962624172438195e-05, 'epoch': 0.06} + + 6%|▌ | 419/7378 [1:26:23<23:52:13, 12.35s/it] + 6%|▌ | 420/7378 [1:26:35<24:00:21, 12.42s/it] + +{'loss': 0.553, 'learning_rate': 1.996224399890239e-05, 'epoch': 0.06} + + 6%|▌ | 420/7378 [1:26:35<24:00:21, 12.42s/it] + 6%|▌ | 421/7378 [1:26:48<23:56:30, 12.39s/it] + +{'loss': 0.5467, 'learning_rate': 1.9961861905300177e-05, 'epoch': 0.06} + + 6%|▌ | 421/7378 [1:26:48<23:56:30, 12.39s/it] + 6%|▌ | 422/7378 [1:27:00<23:52:36, 12.36s/it] + +{'loss': 0.5063, 'learning_rate': 1.9961477891705203e-05, 'epoch': 0.06} + + 6%|▌ | 422/7378 [1:27:00<23:52:36, 12.36s/it] + 6%|▌ | 423/7378 [1:27:12<23:58:57, 12.41s/it] + +{'loss': 0.5476, 'learning_rate': 1.9961091958191476e-05, 'epoch': 0.06} + + 6%|▌ | 423/7378 [1:27:12<23:58:57, 12.41s/it] + 6%|▌ | 424/7378 [1:27:25<23:58:26, 12.41s/it] + +{'loss': 0.5704, 'learning_rate': 1.9960704104833383e-05, 'epoch': 0.06} + + 6%|▌ | 424/7378 [1:27:25<23:58:26, 12.41s/it] + 6%|▌ | 425/7378 [1:27:37<23:57:11, 12.40s/it] + +{'loss': 0.5041, 'learning_rate': 1.9960314331705676e-05, 'epoch': 0.06} + + 6%|▌ | 425/7378 [1:27:37<23:57:11, 12.40s/it] + 6%|▌ | 426/7378 [1:27:49<23:39:00, 12.25s/it] + +{'loss': 0.4986, 'learning_rate': 1.9959922638883473e-05, 'epoch': 0.06} + + 6%|▌ | 426/7378 [1:27:49<23:39:00, 12.25s/it] + 6%|▌ | 427/7378 [1:28:01<23:44:17, 12.29s/it] + +{'loss': 0.5427, 'learning_rate': 1.995952902644227e-05, 'epoch': 0.06} + + 6%|▌ | 427/7378 [1:28:02<23:44:17, 12.29s/it] + 6%|▌ | 428/7378 [1:28:14<23:45:30, 12.31s/it] + +{'loss': 0.5317, 'learning_rate': 1.9959133494457936e-05, 'epoch': 0.06} + + 6%|▌ | 428/7378 [1:28:14<23:45:30, 12.31s/it] + 6%|▌ | 429/7378 [1:28:26<23:36:49, 12.23s/it] + +{'loss': 0.5132, 'learning_rate': 1.9958736043006693e-05, 'epoch': 0.06} + + 6%|▌ | 429/7378 [1:28:26<23:36:49, 12.23s/it] + 6%|▌ | 430/7378 [1:28:38<23:37:49, 12.24s/it] + +{'loss': 0.4674, 'learning_rate': 1.9958336672165147e-05, 'epoch': 0.06} + + 6%|▌ | 430/7378 [1:28:38<23:37:49, 12.24s/it] + 6%|▌ | 431/7378 [1:28:50<23:38:40, 12.25s/it] + +{'loss': 0.4896, 'learning_rate': 1.9957935382010273e-05, 'epoch': 0.06} + + 6%|▌ | 431/7378 [1:28:50<23:38:40, 12.25s/it] + 6%|▌ | 432/7378 [1:29:04<24:15:53, 12.58s/it] + +{'loss': 0.5392, 'learning_rate': 1.995753217261941e-05, 'epoch': 0.06} + + 6%|▌ | 432/7378 [1:29:04<24:15:53, 12.58s/it] + 6%|▌ | 433/7378 [1:29:16<23:49:18, 12.35s/it] + +{'loss': 0.4835, 'learning_rate': 1.9957127044070277e-05, 'epoch': 0.06} + + 6%|▌ | 433/7378 [1:29:16<23:49:18, 12.35s/it] + 6%|▌ | 434/7378 [1:29:28<23:44:25, 12.31s/it] + +{'loss': 0.5744, 'learning_rate': 1.9956719996440947e-05, 'epoch': 0.06} + + 6%|▌ | 434/7378 [1:29:28<23:44:25, 12.31s/it] + 6%|▌ | 435/7378 [1:29:40<23:47:51, 12.34s/it] + +{'loss': 0.5319, 'learning_rate': 1.995631102980988e-05, 'epoch': 0.06} + + 6%|▌ | 435/7378 [1:29:40<23:47:51, 12.34s/it] + 6%|▌ | 436/7378 [1:29:52<23:43:56, 12.31s/it] + +{'loss': 0.5582, 'learning_rate': 1.995590014425589e-05, 'epoch': 0.06} + + 6%|▌ | 436/7378 [1:29:52<23:43:56, 12.31s/it] + 6%|▌ | 437/7378 [1:30:05<23:40:24, 12.28s/it] + +{'loss': 0.5525, 'learning_rate': 1.9955487339858174e-05, 'epoch': 0.06} + + 6%|▌ | 437/7378 [1:30:05<23:40:24, 12.28s/it] + 6%|▌ | 438/7378 [1:30:17<23:46:18, 12.33s/it] + +{'loss': 0.4792, 'learning_rate': 1.9955072616696294e-05, 'epoch': 0.06} + + 6%|▌ | 438/7378 [1:30:17<23:46:18, 12.33s/it] + 6%|▌ | 439/7378 [1:30:29<23:34:56, 12.23s/it] + +{'loss': 0.5268, 'learning_rate': 1.9954655974850183e-05, 'epoch': 0.06} + + 6%|▌ | 439/7378 [1:30:29<23:34:56, 12.23s/it] + 6%|▌ | 440/7378 [1:30:42<23:41:39, 12.29s/it] + +{'loss': 0.5065, 'learning_rate': 1.9954237414400133e-05, 'epoch': 0.06} + + 6%|▌ | 440/7378 [1:30:42<23:41:39, 12.29s/it] + 6%|▌ | 441/7378 [1:30:54<23:40:39, 12.29s/it] + +{'loss': 0.6443, 'learning_rate': 1.9953816935426825e-05, 'epoch': 0.06} + + 6%|▌ | 441/7378 [1:30:54<23:40:39, 12.29s/it] + 6%|▌ | 442/7378 [1:31:06<23:43:52, 12.32s/it] + +{'loss': 0.5591, 'learning_rate': 1.9953394538011294e-05, 'epoch': 0.06} + + 6%|▌ | 442/7378 [1:31:06<23:43:52, 12.32s/it] + 6%|▌ | 443/7378 [1:31:19<23:45:41, 12.33s/it] + +{'loss': 0.5447, 'learning_rate': 1.995297022223495e-05, 'epoch': 0.06} + + 6%|▌ | 443/7378 [1:31:19<23:45:41, 12.33s/it] + 6%|▌ | 444/7378 [1:31:31<23:43:43, 12.32s/it] + +{'loss': 0.6148, 'learning_rate': 1.9952543988179584e-05, 'epoch': 0.06} + + 6%|▌ | 444/7378 [1:31:31<23:43:43, 12.32s/it] + 6%|▌ | 445/7378 [1:31:43<23:53:51, 12.41s/it] + +{'loss': 0.5483, 'learning_rate': 1.995211583592733e-05, 'epoch': 0.06} + + 6%|▌ | 445/7378 [1:31:43<23:53:51, 12.41s/it] + 6%|▌ | 446/7378 [1:31:56<23:46:45, 12.35s/it] + +{'loss': 0.54, 'learning_rate': 1.9951685765560717e-05, 'epoch': 0.06} + + 6%|▌ | 446/7378 [1:31:56<23:46:45, 12.35s/it] + 6%|▌ | 447/7378 [1:32:08<23:44:08, 12.33s/it] + +{'loss': 0.5316, 'learning_rate': 1.9951253777162634e-05, 'epoch': 0.06} + + 6%|▌ | 447/7378 [1:32:08<23:44:08, 12.33s/it] + 6%|▌ | 448/7378 [1:32:20<23:37:35, 12.27s/it] + +{'loss': 0.5066, 'learning_rate': 1.995081987081634e-05, 'epoch': 0.06} + + 6%|▌ | 448/7378 [1:32:20<23:37:35, 12.27s/it] + 6%|▌ | 449/7378 [1:32:33<23:43:27, 12.33s/it] + +{'loss': 0.5442, 'learning_rate': 1.9950384046605458e-05, 'epoch': 0.06} + + 6%|▌ | 449/7378 [1:32:33<23:43:27, 12.33s/it] + 6%|▌ | 450/7378 [1:32:45<23:37:20, 12.27s/it] + +{'loss': 0.5416, 'learning_rate': 1.994994630461399e-05, 'epoch': 0.06} + + 6%|▌ | 450/7378 [1:32:45<23:37:20, 12.27s/it] + 6%|▌ | 451/7378 [1:32:57<23:34:55, 12.26s/it] + +{'loss': 0.5194, 'learning_rate': 1.9949506644926308e-05, 'epoch': 0.06} + + 6%|▌ | 451/7378 [1:32:57<23:34:55, 12.26s/it] + 6%|▌ | 452/7378 [1:33:09<23:37:22, 12.28s/it] + +{'loss': 0.5546, 'learning_rate': 1.9949065067627144e-05, 'epoch': 0.06} + + 6%|▌ | 452/7378 [1:33:09<23:37:22, 12.28s/it] + 6%|▌ | 453/7378 [1:33:21<23:24:19, 12.17s/it] + +{'loss': 0.4921, 'learning_rate': 1.9948621572801604e-05, 'epoch': 0.06} + + 6%|▌ | 453/7378 [1:33:21<23:24:19, 12.17s/it] + 6%|▌ | 454/7378 [1:33:33<23:19:28, 12.13s/it] + +{'loss': 0.4914, 'learning_rate': 1.994817616053517e-05, 'epoch': 0.06} + + 6%|▌ | 454/7378 [1:33:33<23:19:28, 12.13s/it] + 6%|▌ | 455/7378 [1:33:46<23:34:12, 12.26s/it] + +{'loss': 0.5412, 'learning_rate': 1.994772883091369e-05, 'epoch': 0.06} + + 6%|▌ | 455/7378 [1:33:46<23:34:12, 12.26s/it] + 6%|▌ | 456/7378 [1:33:58<23:40:25, 12.31s/it] + +{'loss': 0.5391, 'learning_rate': 1.994727958402337e-05, 'epoch': 0.06} + + 6%|▌ | 456/7378 [1:33:58<23:40:25, 12.31s/it] + 6%|▌ | 457/7378 [1:34:11<23:53:24, 12.43s/it] + +{'loss': 0.5708, 'learning_rate': 1.99468284199508e-05, 'epoch': 0.06} + + 6%|▌ | 457/7378 [1:34:11<23:53:24, 12.43s/it] + 6%|▌ | 458/7378 [1:34:23<23:48:27, 12.39s/it] + +{'loss': 0.5416, 'learning_rate': 1.994637533878294e-05, 'epoch': 0.06} + + 6%|▌ | 458/7378 [1:34:23<23:48:27, 12.39s/it] + 6%|▌ | 459/7378 [1:34:36<23:49:18, 12.39s/it] + +{'loss': 0.6172, 'learning_rate': 1.994592034060711e-05, 'epoch': 0.06} + + 6%|▌ | 459/7378 [1:34:36<23:49:18, 12.39s/it] + 6%|▌ | 460/7378 [1:34:48<23:37:46, 12.30s/it] + +{'loss': 0.5343, 'learning_rate': 1.9945463425511002e-05, 'epoch': 0.06} + + 6%|▌ | 460/7378 [1:34:48<23:37:46, 12.30s/it] + 6%|▌ | 461/7378 [1:35:00<23:54:10, 12.44s/it] + +{'loss': 0.521, 'learning_rate': 1.9945004593582682e-05, 'epoch': 0.06} + + 6%|▌ | 461/7378 [1:35:00<23:54:10, 12.44s/it] + 6%|▋ | 462/7378 [1:35:13<23:47:54, 12.39s/it] + +{'loss': 0.5579, 'learning_rate': 1.994454384491058e-05, 'epoch': 0.06} + + 6%|▋ | 462/7378 [1:35:13<23:47:54, 12.39s/it] + 6%|▋ | 463/7378 [1:35:25<23:41:01, 12.33s/it] + +{'loss': 0.6348, 'learning_rate': 1.9944081179583503e-05, 'epoch': 0.06} + + 6%|▋ | 463/7378 [1:35:25<23:41:01, 12.33s/it] + 6%|▋ | 464/7378 [1:35:37<23:31:30, 12.25s/it] + +{'loss': 0.4638, 'learning_rate': 1.9943616597690616e-05, 'epoch': 0.06} + + 6%|▋ | 464/7378 [1:35:37<23:31:30, 12.25s/it] + 6%|▋ | 465/7378 [1:35:49<23:13:34, 12.10s/it] + +{'loss': 0.462, 'learning_rate': 1.9943150099321463e-05, 'epoch': 0.06} + + 6%|▋ | 465/7378 [1:35:49<23:13:34, 12.10s/it] + 6%|▋ | 466/7378 [1:36:01<23:27:34, 12.22s/it] + +{'loss': 0.5481, 'learning_rate': 1.9942681684565956e-05, 'epoch': 0.06} + + 6%|▋ | 466/7378 [1:36:01<23:27:34, 12.22s/it] + 6%|▋ | 467/7378 [1:36:13<23:17:44, 12.13s/it] + +{'loss': 0.4891, 'learning_rate': 1.9942211353514375e-05, 'epoch': 0.06} + + 6%|▋ | 467/7378 [1:36:13<23:17:44, 12.13s/it] + 6%|▋ | 468/7378 [1:36:26<23:28:50, 12.23s/it] + +{'loss': 0.4897, 'learning_rate': 1.9941739106257362e-05, 'epoch': 0.06} + + 6%|▋ | 468/7378 [1:36:26<23:28:50, 12.23s/it] + 6%|▋ | 469/7378 [1:36:38<23:32:39, 12.27s/it] + +{'loss': 0.5723, 'learning_rate': 1.9941264942885943e-05, 'epoch': 0.06} + + 6%|▋ | 469/7378 [1:36:38<23:32:39, 12.27s/it] + 6%|▋ | 470/7378 [1:36:50<23:32:07, 12.27s/it] + +{'loss': 0.5371, 'learning_rate': 1.9940788863491503e-05, 'epoch': 0.06} + + 6%|▋ | 470/7378 [1:36:50<23:32:07, 12.27s/it] + 6%|▋ | 471/7378 [1:37:03<23:39:40, 12.33s/it] + +{'loss': 0.5004, 'learning_rate': 1.9940310868165796e-05, 'epoch': 0.06} + + 6%|▋ | 471/7378 [1:37:03<23:39:40, 12.33s/it] + 6%|▋ | 472/7378 [1:37:15<23:49:51, 12.42s/it] + +{'loss': 0.5529, 'learning_rate': 1.9939830957000955e-05, 'epoch': 0.06} + + 6%|▋ | 472/7378 [1:37:15<23:49:51, 12.42s/it] + 6%|▋ | 473/7378 [1:37:28<23:45:42, 12.39s/it] + +{'loss': 0.5341, 'learning_rate': 1.9939349130089466e-05, 'epoch': 0.06} + + 6%|▋ | 473/7378 [1:37:28<23:45:42, 12.39s/it] + 6%|▋ | 474/7378 [1:37:40<23:54:07, 12.46s/it] + +{'loss': 0.5179, 'learning_rate': 1.99388653875242e-05, 'epoch': 0.06} + + 6%|▋ | 474/7378 [1:37:40<23:54:07, 12.46s/it] + 6%|▋ | 475/7378 [1:37:52<23:43:10, 12.37s/it] + +{'loss': 0.4792, 'learning_rate': 1.9938379729398392e-05, 'epoch': 0.06} + + 6%|▋ | 475/7378 [1:37:52<23:43:10, 12.37s/it] + 6%|▋ | 476/7378 [1:38:05<23:52:27, 12.45s/it] + +{'loss': 0.5468, 'learning_rate': 1.993789215580564e-05, 'epoch': 0.06} + + 6%|▋ | 476/7378 [1:38:05<23:52:27, 12.45s/it] + 6%|▋ | 477/7378 [1:38:17<23:45:10, 12.39s/it] + +{'loss': 0.5128, 'learning_rate': 1.9937402666839924e-05, 'epoch': 0.06} + + 6%|▋ | 477/7378 [1:38:17<23:45:10, 12.39s/it] + 6%|▋ | 478/7378 [1:38:30<24:05:57, 12.57s/it] + +{'loss': 0.5428, 'learning_rate': 1.9936911262595574e-05, 'epoch': 0.06} + + 6%|▋ | 478/7378 [1:38:30<24:05:57, 12.57s/it] + 6%|▋ | 479/7378 [1:38:43<24:23:05, 12.72s/it] + +{'loss': 0.5564, 'learning_rate': 1.9936417943167308e-05, 'epoch': 0.06} + + 6%|▋ | 479/7378 [1:38:43<24:23:05, 12.72s/it] + 7%|▋ | 480/7378 [1:38:56<24:04:30, 12.56s/it] + +{'loss': 0.4805, 'learning_rate': 1.9935922708650203e-05, 'epoch': 0.07} + + 7%|▋ | 480/7378 [1:38:56<24:04:30, 12.56s/it] + 7%|▋ | 481/7378 [1:39:08<23:58:57, 12.52s/it] + +{'loss': 0.6132, 'learning_rate': 1.993542555913971e-05, 'epoch': 0.07} + + 7%|▋ | 481/7378 [1:39:08<23:58:57, 12.52s/it] + 7%|▋ | 482/7378 [1:39:20<23:39:55, 12.35s/it] + +{'loss': 0.5124, 'learning_rate': 1.9934926494731645e-05, 'epoch': 0.07} + + 7%|▋ | 482/7378 [1:39:20<23:39:55, 12.35s/it] + 7%|▋ | 483/7378 [1:39:32<23:34:44, 12.31s/it] + +{'loss': 0.5146, 'learning_rate': 1.9934425515522197e-05, 'epoch': 0.07} + + 7%|▋ | 483/7378 [1:39:32<23:34:44, 12.31s/it] + 7%|▋ | 484/7378 [1:39:45<23:35:34, 12.32s/it] + +{'loss': 0.5443, 'learning_rate': 1.9933922621607918e-05, 'epoch': 0.07} + + 7%|▋ | 484/7378 [1:39:45<23:35:34, 12.32s/it] + 7%|▋ | 485/7378 [1:39:57<23:27:15, 12.25s/it] + +{'loss': 0.4416, 'learning_rate': 1.9933417813085735e-05, 'epoch': 0.07} + + 7%|▋ | 485/7378 [1:39:57<23:27:15, 12.25s/it] + 7%|▋ | 486/7378 [1:40:09<23:17:38, 12.17s/it] + +{'loss': 0.4888, 'learning_rate': 1.993291109005294e-05, 'epoch': 0.07} + + 7%|▋ | 486/7378 [1:40:09<23:17:38, 12.17s/it] + 7%|▋ | 487/7378 [1:40:21<23:08:29, 12.09s/it] + +{'loss': 0.5316, 'learning_rate': 1.99324024526072e-05, 'epoch': 0.07} + + 7%|▋ | 487/7378 [1:40:21<23:08:29, 12.09s/it] + 7%|▋ | 488/7378 [1:40:33<23:20:16, 12.19s/it] + +{'loss': 0.4617, 'learning_rate': 1.993189190084655e-05, 'epoch': 0.07} + + 7%|▋ | 488/7378 [1:40:33<23:20:16, 12.19s/it] + 7%|▋ | 489/7378 [1:40:45<23:29:55, 12.28s/it] + +{'loss': 0.4937, 'learning_rate': 1.993137943486938e-05, 'epoch': 0.07} + + 7%|▋ | 489/7378 [1:40:45<23:29:55, 12.28s/it] + 7%|▋ | 490/7378 [1:40:59<23:57:20, 12.52s/it] + +{'loss': 0.5069, 'learning_rate': 1.9930865054774466e-05, 'epoch': 0.07} + + 7%|▋ | 490/7378 [1:40:59<23:57:20, 12.52s/it] + 7%|▋ | 491/7378 [1:41:11<23:54:25, 12.50s/it] + +{'loss': 0.5074, 'learning_rate': 1.9930348760660946e-05, 'epoch': 0.07} + + 7%|▋ | 491/7378 [1:41:11<23:54:25, 12.50s/it] + 7%|▋ | 492/7378 [1:41:23<23:43:30, 12.40s/it] + +{'loss': 0.5614, 'learning_rate': 1.992983055262833e-05, 'epoch': 0.07} + + 7%|▋ | 492/7378 [1:41:23<23:43:30, 12.40s/it] + 7%|▋ | 493/7378 [1:41:35<23:20:27, 12.20s/it] + +{'loss': 0.4714, 'learning_rate': 1.992931043077649e-05, 'epoch': 0.07} + + 7%|▋ | 493/7378 [1:41:35<23:20:27, 12.20s/it] + 7%|▋ | 494/7378 [1:41:47<23:27:20, 12.27s/it] + +{'loss': 0.5413, 'learning_rate': 1.9928788395205673e-05, 'epoch': 0.07} + + 7%|▋ | 494/7378 [1:41:47<23:27:20, 12.27s/it] + 7%|▋ | 495/7378 [1:41:59<23:20:35, 12.21s/it] + +{'loss': 0.4732, 'learning_rate': 1.9928264446016496e-05, 'epoch': 0.07} + + 7%|▋ | 495/7378 [1:41:59<23:20:35, 12.21s/it] + 7%|▋ | 496/7378 [1:42:11<23:10:44, 12.12s/it] + +{'loss': 0.5745, 'learning_rate': 1.992773858330994e-05, 'epoch': 0.07} + + 7%|▋ | 496/7378 [1:42:11<23:10:44, 12.12s/it] + 7%|▋ | 497/7378 [1:42:24<23:20:05, 12.21s/it] + +{'loss': 0.5256, 'learning_rate': 1.9927210807187354e-05, 'epoch': 0.07} + + 7%|▋ | 497/7378 [1:42:24<23:20:05, 12.21s/it] + 7%|▋ | 498/7378 [1:42:36<23:18:29, 12.20s/it] + +{'loss': 0.5426, 'learning_rate': 1.9926681117750463e-05, 'epoch': 0.07} + + 7%|▋ | 498/7378 [1:42:36<23:18:29, 12.20s/it] + 7%|▋ | 499/7378 [1:42:48<23:14:59, 12.17s/it] + +{'loss': 0.4927, 'learning_rate': 1.9926149515101355e-05, 'epoch': 0.07} + + 7%|▋ | 499/7378 [1:42:48<23:14:59, 12.17s/it] + 7%|▋ | 500/7378 [1:43:00<23:19:07, 12.21s/it] + +{'loss': 0.4938, 'learning_rate': 1.9925615999342484e-05, 'epoch': 0.07} + + 7%|▋ | 500/7378 [1:43:00<23:19:07, 12.21s/it] + 7%|▋ | 501/7378 [1:43:13<23:22:25, 12.24s/it] + +{'loss': 0.6007, 'learning_rate': 1.9925080570576686e-05, 'epoch': 0.07} + + 7%|▋ | 501/7378 [1:43:13<23:22:25, 12.24s/it] + 7%|▋ | 502/7378 [1:43:25<23:38:14, 12.38s/it] + +{'loss': 0.5123, 'learning_rate': 1.9924543228907147e-05, 'epoch': 0.07} + + 7%|▋ | 502/7378 [1:43:25<23:38:14, 12.38s/it] + 7%|▋ | 503/7378 [1:43:38<23:40:50, 12.40s/it] + +{'loss': 0.4792, 'learning_rate': 1.9924003974437435e-05, 'epoch': 0.07} + + 7%|▋ | 503/7378 [1:43:38<23:40:50, 12.40s/it] + 7%|▋ | 504/7378 [1:43:50<23:48:10, 12.47s/it] + +{'loss': 0.5345, 'learning_rate': 1.9923462807271482e-05, 'epoch': 0.07} + + 7%|▋ | 504/7378 [1:43:50<23:48:10, 12.47s/it] + 7%|▋ | 505/7378 [1:44:03<23:39:04, 12.39s/it] + +{'loss': 0.4393, 'learning_rate': 1.9922919727513594e-05, 'epoch': 0.07} + + 7%|▋ | 505/7378 [1:44:03<23:39:04, 12.39s/it] + 7%|▋ | 506/7378 [1:44:15<23:39:09, 12.39s/it] + +{'loss': 0.4648, 'learning_rate': 1.9922374735268434e-05, 'epoch': 0.07} + + 7%|▋ | 506/7378 [1:44:15<23:39:09, 12.39s/it] + 7%|▋ | 507/7378 [1:44:27<23:37:33, 12.38s/it] + +{'loss': 0.461, 'learning_rate': 1.992182783064105e-05, 'epoch': 0.07} + + 7%|▋ | 507/7378 [1:44:27<23:37:33, 12.38s/it] + 7%|▋ | 508/7378 [1:44:39<23:29:49, 12.31s/it] + +{'loss': 0.5082, 'learning_rate': 1.992127901373684e-05, 'epoch': 0.07} + + 7%|▋ | 508/7378 [1:44:39<23:29:49, 12.31s/it] + 7%|▋ | 509/7378 [1:44:52<23:29:00, 12.31s/it] + +{'loss': 0.5457, 'learning_rate': 1.992072828466158e-05, 'epoch': 0.07} + + 7%|▋ | 509/7378 [1:44:52<23:29:00, 12.31s/it] + 7%|▋ | 510/7378 [1:45:04<23:22:33, 12.25s/it] + +{'loss': 0.4618, 'learning_rate': 1.992017564352142e-05, 'epoch': 0.07} + + 7%|▋ | 510/7378 [1:45:04<23:22:33, 12.25s/it] + 7%|▋ | 511/7378 [1:45:16<23:18:12, 12.22s/it] + +{'loss': 0.5181, 'learning_rate': 1.991962109042287e-05, 'epoch': 0.07} + + 7%|▋ | 511/7378 [1:45:16<23:18:12, 12.22s/it] + 7%|▋ | 512/7378 [1:45:28<23:21:49, 12.25s/it] + +{'loss': 0.466, 'learning_rate': 1.9919064625472813e-05, 'epoch': 0.07} + + 7%|▋ | 512/7378 [1:45:28<23:21:49, 12.25s/it] + 7%|▋ | 513/7378 [1:45:40<23:15:50, 12.20s/it] + +{'loss': 0.6143, 'learning_rate': 1.99185062487785e-05, 'epoch': 0.07} + + 7%|▋ | 513/7378 [1:45:40<23:15:50, 12.20s/it] + 7%|▋ | 514/7378 [1:45:53<23:20:49, 12.24s/it] + +{'loss': 0.5527, 'learning_rate': 1.9917945960447546e-05, 'epoch': 0.07} + + 7%|▋ | 514/7378 [1:45:53<23:20:49, 12.24s/it] + 7%|▋ | 515/7378 [1:46:05<23:31:39, 12.34s/it] + +{'loss': 0.5031, 'learning_rate': 1.991738376058794e-05, 'epoch': 0.07} + + 7%|▋ | 515/7378 [1:46:05<23:31:39, 12.34s/it] + 7%|▋ | 516/7378 [1:46:18<23:32:10, 12.35s/it] + +{'loss': 0.4898, 'learning_rate': 1.991681964930803e-05, 'epoch': 0.07} + + 7%|▋ | 516/7378 [1:46:18<23:32:10, 12.35s/it] + 7%|▋ | 517/7378 [1:46:30<23:22:51, 12.27s/it] + +{'loss': 0.4975, 'learning_rate': 1.9916253626716556e-05, 'epoch': 0.07} + + 7%|▋ | 517/7378 [1:46:30<23:22:51, 12.27s/it] + 7%|▋ | 518/7378 [1:46:42<23:16:26, 12.21s/it] + +{'loss': 0.4599, 'learning_rate': 1.9915685692922592e-05, 'epoch': 0.07} + + 7%|▋ | 518/7378 [1:46:42<23:16:26, 12.21s/it] + 7%|▋ | 519/7378 [1:46:54<23:16:58, 12.22s/it] + +{'loss': 0.4806, 'learning_rate': 1.991511584803561e-05, 'epoch': 0.07} + + 7%|▋ | 519/7378 [1:46:54<23:16:58, 12.22s/it] + 7%|▋ | 520/7378 [1:47:07<23:24:59, 12.29s/it] + +{'loss': 0.534, 'learning_rate': 1.9914544092165436e-05, 'epoch': 0.07} + + 7%|▋ | 520/7378 [1:47:07<23:24:59, 12.29s/it] + 7%|▋ | 521/7378 [1:47:19<23:31:51, 12.35s/it] + +{'loss': 0.5152, 'learning_rate': 1.9913970425422265e-05, 'epoch': 0.07} + + 7%|▋ | 521/7378 [1:47:19<23:31:51, 12.35s/it] + 7%|▋ | 522/7378 [1:47:31<23:22:23, 12.27s/it] + +{'loss': 0.6205, 'learning_rate': 1.9913394847916662e-05, 'epoch': 0.07} + + 7%|▋ | 522/7378 [1:47:31<23:22:23, 12.27s/it] + 7%|▋ | 523/7378 [1:47:43<23:24:45, 12.30s/it] + +{'loss': 0.5451, 'learning_rate': 1.991281735975956e-05, 'epoch': 0.07} + + 7%|▋ | 523/7378 [1:47:43<23:24:45, 12.30s/it] + 7%|▋ | 524/7378 [1:47:56<23:21:53, 12.27s/it] + +{'loss': 0.4657, 'learning_rate': 1.9912237961062268e-05, 'epoch': 0.07} + + 7%|▋ | 524/7378 [1:47:56<23:21:53, 12.27s/it] + 7%|▋ | 525/7378 [1:48:08<23:32:54, 12.37s/it] + +{'loss': 0.4952, 'learning_rate': 1.9911656651936446e-05, 'epoch': 0.07} + + 7%|▋ | 525/7378 [1:48:08<23:32:54, 12.37s/it] + 7%|▋ | 526/7378 [1:48:21<23:41:27, 12.45s/it] + +{'loss': 0.5376, 'learning_rate': 1.9911073432494138e-05, 'epoch': 0.07} + + 7%|▋ | 526/7378 [1:48:21<23:41:27, 12.45s/it] + 7%|▋ | 527/7378 [1:48:33<23:37:56, 12.42s/it] + +{'loss': 0.5803, 'learning_rate': 1.991048830284775e-05, 'epoch': 0.07} + + 7%|▋ | 527/7378 [1:48:33<23:37:56, 12.42s/it] + 7%|▋ | 528/7378 [1:48:47<24:09:16, 12.69s/it] + +{'loss': 0.5248, 'learning_rate': 1.9909901263110053e-05, 'epoch': 0.07} + + 7%|▋ | 528/7378 [1:48:47<24:09:16, 12.69s/it] + 7%|▋ | 529/7378 [1:48:59<24:06:34, 12.67s/it] + +{'loss': 0.5077, 'learning_rate': 1.9909312313394197e-05, 'epoch': 0.07} + + 7%|▋ | 529/7378 [1:48:59<24:06:34, 12.67s/it] + 7%|▋ | 530/7378 [1:49:12<23:53:48, 12.56s/it] + +{'loss': 0.4996, 'learning_rate': 1.9908721453813686e-05, 'epoch': 0.07} + + 7%|▋ | 530/7378 [1:49:12<23:53:48, 12.56s/it] + 7%|▋ | 531/7378 [1:49:24<23:46:16, 12.50s/it] + +{'loss': 0.4959, 'learning_rate': 1.9908128684482398e-05, 'epoch': 0.07} + + 7%|▋ | 531/7378 [1:49:24<23:46:16, 12.50s/it] + 7%|▋ | 532/7378 [1:49:37<23:51:27, 12.55s/it] + +{'loss': 0.463, 'learning_rate': 1.990753400551459e-05, 'epoch': 0.07} + + 7%|▋ | 532/7378 [1:49:37<23:51:27, 12.55s/it] + 7%|▋ | 533/7378 [1:49:49<23:37:29, 12.42s/it] + +{'loss': 0.5099, 'learning_rate': 1.9906937417024866e-05, 'epoch': 0.07} + + 7%|▋ | 533/7378 [1:49:49<23:37:29, 12.42s/it] + 7%|▋ | 534/7378 [1:50:01<23:30:19, 12.36s/it] + +{'loss': 0.5516, 'learning_rate': 1.9906338919128214e-05, 'epoch': 0.07} + + 7%|▋ | 534/7378 [1:50:01<23:30:19, 12.36s/it] + 7%|▋ | 535/7378 [1:50:13<23:17:02, 12.25s/it] + +{'loss': 0.5803, 'learning_rate': 1.9905738511939983e-05, 'epoch': 0.07} + + 7%|▋ | 535/7378 [1:50:13<23:17:02, 12.25s/it] + 7%|▋ | 536/7378 [1:50:26<23:31:17, 12.38s/it] + +{'loss': 0.6496, 'learning_rate': 1.9905136195575895e-05, 'epoch': 0.07} + + 7%|▋ | 536/7378 [1:50:26<23:31:17, 12.38s/it] + 7%|▋ | 537/7378 [1:50:38<23:27:51, 12.35s/it] + +{'loss': 0.5084, 'learning_rate': 1.9904531970152036e-05, 'epoch': 0.07} + + 7%|▋ | 537/7378 [1:50:38<23:27:51, 12.35s/it] + 7%|▋ | 538/7378 [1:50:50<23:27:01, 12.34s/it] + +{'loss': 0.5826, 'learning_rate': 1.990392583578486e-05, 'epoch': 0.07} + + 7%|▋ | 538/7378 [1:50:50<23:27:01, 12.34s/it] + 7%|▋ | 539/7378 [1:51:03<23:26:26, 12.34s/it] + +{'loss': 0.5343, 'learning_rate': 1.990331779259119e-05, 'epoch': 0.07} + + 7%|▋ | 539/7378 [1:51:03<23:26:26, 12.34s/it] + 7%|▋ | 540/7378 [1:51:15<23:34:27, 12.41s/it] + +{'loss': 0.4951, 'learning_rate': 1.9902707840688217e-05, 'epoch': 0.07} + + 7%|▋ | 540/7378 [1:51:15<23:34:27, 12.41s/it] + 7%|▋ | 541/7378 [1:51:28<23:41:03, 12.47s/it] + +{'loss': 0.5465, 'learning_rate': 1.9902095980193503e-05, 'epoch': 0.07} + + 7%|▋ | 541/7378 [1:51:28<23:41:03, 12.47s/it] + 7%|▋ | 542/7378 [1:51:40<23:26:34, 12.35s/it] + +{'loss': 0.4188, 'learning_rate': 1.990148221122497e-05, 'epoch': 0.07} + + 7%|▋ | 542/7378 [1:51:40<23:26:34, 12.35s/it] + 7%|▋ | 543/7378 [1:51:52<23:11:43, 12.22s/it] + +{'loss': 0.4762, 'learning_rate': 1.9900866533900914e-05, 'epoch': 0.07} + + 7%|▋ | 543/7378 [1:51:52<23:11:43, 12.22s/it] + 7%|▋ | 544/7378 [1:52:04<23:27:09, 12.35s/it] + +{'loss': 0.5619, 'learning_rate': 1.9900248948339996e-05, 'epoch': 0.07} + + 7%|▋ | 544/7378 [1:52:04<23:27:09, 12.35s/it] + 7%|▋ | 545/7378 [1:52:16<23:19:12, 12.29s/it] + +{'loss': 0.4939, 'learning_rate': 1.9899629454661246e-05, 'epoch': 0.07} + + 7%|▋ | 545/7378 [1:52:16<23:19:12, 12.29s/it] + 7%|▋ | 546/7378 [1:52:29<23:10:52, 12.21s/it] + +{'loss': 0.5187, 'learning_rate': 1.9899008052984065e-05, 'epoch': 0.07} + + 7%|▋ | 546/7378 [1:52:29<23:10:52, 12.21s/it] + 7%|▋ | 547/7378 [1:52:41<23:13:45, 12.24s/it] + +{'loss': 0.4755, 'learning_rate': 1.9898384743428213e-05, 'epoch': 0.07} + + 7%|▋ | 547/7378 [1:52:41<23:13:45, 12.24s/it] + 7%|▋ | 548/7378 [1:52:53<23:05:11, 12.17s/it] + +{'loss': 0.4732, 'learning_rate': 1.9897759526113826e-05, 'epoch': 0.07} + + 7%|▋ | 548/7378 [1:52:53<23:05:11, 12.17s/it] + 7%|▋ | 549/7378 [1:53:05<23:06:57, 12.19s/it] + +{'loss': 0.5289, 'learning_rate': 1.989713240116141e-05, 'epoch': 0.07} + + 7%|▋ | 549/7378 [1:53:05<23:06:57, 12.19s/it] + 7%|▋ | 550/7378 [1:53:17<23:12:32, 12.24s/it] + +{'loss': 0.455, 'learning_rate': 1.9896503368691826e-05, 'epoch': 0.07} + + 7%|▋ | 550/7378 [1:53:17<23:12:32, 12.24s/it] + 7%|▋ | 551/7378 [1:53:30<23:22:24, 12.33s/it] + +{'loss': 0.4455, 'learning_rate': 1.9895872428826307e-05, 'epoch': 0.07} + + 7%|▋ | 551/7378 [1:53:30<23:22:24, 12.33s/it] + 7%|▋ | 552/7378 [1:53:42<23:25:59, 12.36s/it] + +{'loss': 0.5114, 'learning_rate': 1.989523958168647e-05, 'epoch': 0.07} + + 7%|▋ | 552/7378 [1:53:42<23:25:59, 12.36s/it] + 7%|▋ | 553/7378 [1:53:55<23:25:23, 12.36s/it] + +{'loss': 0.507, 'learning_rate': 1.9894604827394273e-05, 'epoch': 0.07} + + 7%|▋ | 553/7378 [1:53:55<23:25:23, 12.36s/it] + 8%|▊ | 554/7378 [1:54:07<23:21:37, 12.32s/it] + +{'loss': 0.5711, 'learning_rate': 1.9893968166072067e-05, 'epoch': 0.08} + + 8%|▊ | 554/7378 [1:54:07<23:21:37, 12.32s/it] + 8%|▊ | 555/7378 [1:54:19<23:08:07, 12.21s/it] + +{'loss': 0.4643, 'learning_rate': 1.989332959784255e-05, 'epoch': 0.08} + + 8%|▊ | 555/7378 [1:54:19<23:08:07, 12.21s/it] + 8%|▊ | 556/7378 [1:54:31<23:02:10, 12.16s/it] + +{'loss': 0.5242, 'learning_rate': 1.9892689122828797e-05, 'epoch': 0.08} + + 8%|▊ | 556/7378 [1:54:31<23:02:10, 12.16s/it] + 8%|▊ | 557/7378 [1:54:43<23:12:45, 12.25s/it] + +{'loss': 0.5198, 'learning_rate': 1.989204674115425e-05, 'epoch': 0.08} + + 8%|▊ | 557/7378 [1:54:43<23:12:45, 12.25s/it] + 8%|▊ | 558/7378 [1:54:56<23:12:56, 12.25s/it] + +{'loss': 0.4676, 'learning_rate': 1.989140245294272e-05, 'epoch': 0.08} + + 8%|▊ | 558/7378 [1:54:56<23:12:56, 12.25s/it] + 8%|▊ | 559/7378 [1:55:08<23:12:26, 12.25s/it] + +{'loss': 0.5924, 'learning_rate': 1.9890756258318383e-05, 'epoch': 0.08} + + 8%|▊ | 559/7378 [1:55:08<23:12:26, 12.25s/it] + 8%|▊ | 560/7378 [1:55:21<23:26:36, 12.38s/it] + +{'loss': 0.4926, 'learning_rate': 1.9890108157405782e-05, 'epoch': 0.08} + + 8%|▊ | 560/7378 [1:55:21<23:26:36, 12.38s/it] + 8%|▊ | 561/7378 [1:55:40<27:18:13, 14.42s/it] + +{'loss': 0.4735, 'learning_rate': 1.9889458150329827e-05, 'epoch': 0.08} + + 8%|▊ | 561/7378 [1:55:40<27:18:13, 14.42s/it] + 8%|▊ | 562/7378 [1:55:52<26:08:44, 13.81s/it] + +{'loss': 0.5232, 'learning_rate': 1.98888062372158e-05, 'epoch': 0.08} + + 8%|▊ | 562/7378 [1:55:52<26:08:44, 13.81s/it] + 8%|▊ | 563/7378 [1:56:04<25:10:26, 13.30s/it] + +{'loss': 0.5719, 'learning_rate': 1.988815241818934e-05, 'epoch': 0.08} + + 8%|▊ | 563/7378 [1:56:04<25:10:26, 13.30s/it] + 8%|▊ | 564/7378 [1:56:17<24:48:27, 13.11s/it] + +{'loss': 0.5988, 'learning_rate': 1.9887496693376473e-05, 'epoch': 0.08} + + 8%|▊ | 564/7378 [1:56:17<24:48:27, 13.11s/it] + 8%|▊ | 565/7378 [1:56:29<24:24:31, 12.90s/it] + +{'loss': 0.5266, 'learning_rate': 1.9886839062903568e-05, 'epoch': 0.08} + + 8%|▊ | 565/7378 [1:56:29<24:24:31, 12.90s/it] + 8%|▊ | 566/7378 [1:56:41<23:50:44, 12.60s/it] + +{'loss': 0.5273, 'learning_rate': 1.988617952689738e-05, 'epoch': 0.08} + + 8%|▊ | 566/7378 [1:56:41<23:50:44, 12.60s/it] + 8%|▊ | 567/7378 [1:56:57<25:38:30, 13.55s/it] + +{'loss': 0.5008, 'learning_rate': 1.988551808548502e-05, 'epoch': 0.08} + + 8%|▊ | 567/7378 [1:56:57<25:38:30, 13.55s/it] + 8%|▊ | 568/7378 [1:57:09<25:00:30, 13.22s/it] + +{'loss': 0.5605, 'learning_rate': 1.988485473879397e-05, 'epoch': 0.08} + + 8%|▊ | 568/7378 [1:57:09<25:00:30, 13.22s/it] + 8%|▊ | 569/7378 [1:57:22<24:28:18, 12.94s/it] + +{'loss': 0.4763, 'learning_rate': 1.988418948695208e-05, 'epoch': 0.08} + + 8%|▊ | 569/7378 [1:57:22<24:28:18, 12.94s/it] + 8%|▊ | 570/7378 [1:57:34<23:55:26, 12.65s/it] + +{'loss': 0.5534, 'learning_rate': 1.988352233008757e-05, 'epoch': 0.08} + + 8%|▊ | 570/7378 [1:57:34<23:55:26, 12.65s/it] + 8%|▊ | 571/7378 [1:57:46<23:37:02, 12.49s/it] + +{'loss': 0.5771, 'learning_rate': 1.9882853268329027e-05, 'epoch': 0.08} + + 8%|▊ | 571/7378 [1:57:46<23:37:02, 12.49s/it] + 8%|▊ | 572/7378 [1:57:58<23:34:49, 12.47s/it] + +{'loss': 0.5604, 'learning_rate': 1.9882182301805393e-05, 'epoch': 0.08} + + 8%|▊ | 572/7378 [1:57:58<23:34:49, 12.47s/it] + 8%|▊ | 573/7378 [1:58:10<23:24:30, 12.38s/it] + +{'loss': 0.5238, 'learning_rate': 1.988150943064599e-05, 'epoch': 0.08} + + 8%|▊ | 573/7378 [1:58:10<23:24:30, 12.38s/it] + 8%|▊ | 574/7378 [1:58:23<23:16:19, 12.31s/it] + +{'loss': 0.5646, 'learning_rate': 1.988083465498051e-05, 'epoch': 0.08} + + 8%|▊ | 574/7378 [1:58:23<23:16:19, 12.31s/it] + 8%|▊ | 575/7378 [1:58:35<23:22:05, 12.37s/it] + +{'loss': 0.5469, 'learning_rate': 1.9880157974938994e-05, 'epoch': 0.08} + + 8%|▊ | 575/7378 [1:58:35<23:22:05, 12.37s/it] + 8%|▊ | 576/7378 [1:58:48<23:24:55, 12.39s/it] + +{'loss': 0.4475, 'learning_rate': 1.9879479390651867e-05, 'epoch': 0.08} + + 8%|▊ | 576/7378 [1:58:48<23:24:55, 12.39s/it] + 8%|▊ | 577/7378 [1:59:00<23:28:07, 12.42s/it] + +{'loss': 0.5845, 'learning_rate': 1.987879890224992e-05, 'epoch': 0.08} + + 8%|▊ | 577/7378 [1:59:00<23:28:07, 12.42s/it] + 8%|▊ | 578/7378 [1:59:13<23:37:58, 12.51s/it] + +{'loss': 0.4807, 'learning_rate': 1.98781165098643e-05, 'epoch': 0.08} + + 8%|▊ | 578/7378 [1:59:13<23:37:58, 12.51s/it] + 8%|▊ | 579/7378 [1:59:25<23:33:28, 12.47s/it] + +{'loss': 0.5284, 'learning_rate': 1.987743221362653e-05, 'epoch': 0.08} + + 8%|▊ | 579/7378 [1:59:25<23:33:28, 12.47s/it] + 8%|▊ | 580/7378 [1:59:38<23:35:47, 12.50s/it] + +{'loss': 0.5875, 'learning_rate': 1.9876746013668494e-05, 'epoch': 0.08} + + 8%|▊ | 580/7378 [1:59:38<23:35:47, 12.50s/it] + 8%|▊ | 581/7378 [1:59:50<23:39:33, 12.53s/it] + +{'loss': 0.4578, 'learning_rate': 1.987605791012245e-05, 'epoch': 0.08} + + 8%|▊ | 581/7378 [1:59:50<23:39:33, 12.53s/it] + 8%|▊ | 582/7378 [2:00:03<23:29:25, 12.44s/it] + +{'loss': 0.5203, 'learning_rate': 1.9875367903121022e-05, 'epoch': 0.08} + + 8%|▊ | 582/7378 [2:00:03<23:29:25, 12.44s/it] + 8%|▊ | 583/7378 [2:00:15<23:25:01, 12.41s/it] + +{'loss': 0.4913, 'learning_rate': 1.987467599279719e-05, 'epoch': 0.08} + + 8%|▊ | 583/7378 [2:00:15<23:25:01, 12.41s/it] + 8%|▊ | 584/7378 [2:00:27<23:29:58, 12.45s/it] + +{'loss': 0.5128, 'learning_rate': 1.9873982179284316e-05, 'epoch': 0.08} + + 8%|▊ | 584/7378 [2:00:27<23:29:58, 12.45s/it] + 8%|▊ | 585/7378 [2:00:40<23:30:52, 12.46s/it] + +{'loss': 0.4782, 'learning_rate': 1.9873286462716118e-05, 'epoch': 0.08} + + 8%|▊ | 585/7378 [2:00:40<23:30:52, 12.46s/it] + 8%|▊ | 586/7378 [2:00:53<23:35:20, 12.50s/it] + +{'loss': 0.528, 'learning_rate': 1.9872588843226687e-05, 'epoch': 0.08} + + 8%|▊ | 586/7378 [2:00:53<23:35:20, 12.50s/it] + 8%|▊ | 587/7378 [2:01:05<23:32:23, 12.48s/it] + +{'loss': 0.5291, 'learning_rate': 1.9871889320950476e-05, 'epoch': 0.08} + + 8%|▊ | 587/7378 [2:01:05<23:32:23, 12.48s/it] + 8%|▊ | 588/7378 [2:01:17<23:27:17, 12.44s/it] + +{'loss': 0.5167, 'learning_rate': 1.9871187896022305e-05, 'epoch': 0.08} + + 8%|▊ | 588/7378 [2:01:17<23:27:17, 12.44s/it] + 8%|▊ | 589/7378 [2:01:30<23:24:48, 12.42s/it] + +{'loss': 0.5691, 'learning_rate': 1.987048456857737e-05, 'epoch': 0.08} + + 8%|▊ | 589/7378 [2:01:30<23:24:48, 12.42s/it] + 8%|▊ | 590/7378 [2:01:42<23:21:34, 12.39s/it] + +{'loss': 0.5383, 'learning_rate': 1.9869779338751217e-05, 'epoch': 0.08} + + 8%|▊ | 590/7378 [2:01:42<23:21:34, 12.39s/it] + 8%|▊ | 591/7378 [2:01:55<23:38:32, 12.54s/it] + +{'loss': 0.5211, 'learning_rate': 1.986907220667978e-05, 'epoch': 0.08} + + 8%|▊ | 591/7378 [2:01:55<23:38:32, 12.54s/it] + 8%|▊ | 592/7378 [2:02:07<23:37:55, 12.54s/it] + +{'loss': 0.4829, 'learning_rate': 1.9868363172499334e-05, 'epoch': 0.08} + + 8%|▊ | 592/7378 [2:02:07<23:37:55, 12.54s/it] + 8%|▊ | 593/7378 [2:02:20<23:28:23, 12.45s/it] + +{'loss': 0.5039, 'learning_rate': 1.986765223634654e-05, 'epoch': 0.08} + + 8%|▊ | 593/7378 [2:02:20<23:28:23, 12.45s/it] + 8%|▊ | 594/7378 [2:02:32<23:24:12, 12.42s/it] + +{'loss': 0.4484, 'learning_rate': 1.986693939835842e-05, 'epoch': 0.08} + + 8%|▊ | 594/7378 [2:02:32<23:24:12, 12.42s/it] + 8%|▊ | 595/7378 [2:02:44<23:18:03, 12.37s/it] + +{'loss': 0.5451, 'learning_rate': 1.9866224658672365e-05, 'epoch': 0.08} + + 8%|▊ | 595/7378 [2:02:44<23:18:03, 12.37s/it] + 8%|▊ | 596/7378 [2:02:57<23:24:59, 12.43s/it] + +{'loss': 0.4931, 'learning_rate': 1.9865508017426127e-05, 'epoch': 0.08} + + 8%|▊ | 596/7378 [2:02:57<23:24:59, 12.43s/it] + 8%|▊ | 597/7378 [2:03:09<23:27:45, 12.46s/it] + +{'loss': 0.5206, 'learning_rate': 1.986478947475783e-05, 'epoch': 0.08} + + 8%|▊ | 597/7378 [2:03:09<23:27:45, 12.46s/it] + 8%|▊ | 598/7378 [2:03:22<23:23:35, 12.42s/it] + +{'loss': 0.5157, 'learning_rate': 1.9864069030805955e-05, 'epoch': 0.08} + + 8%|▊ | 598/7378 [2:03:22<23:23:35, 12.42s/it] + 8%|▊ | 599/7378 [2:03:34<23:14:45, 12.34s/it] + +{'loss': 0.4997, 'learning_rate': 1.9863346685709365e-05, 'epoch': 0.08} + + 8%|▊ | 599/7378 [2:03:34<23:14:45, 12.34s/it] + 8%|▊ | 600/7378 [2:03:46<23:00:16, 12.22s/it] + +{'loss': 0.5676, 'learning_rate': 1.9862622439607276e-05, 'epoch': 0.08} + + 8%|▊ | 600/7378 [2:03:46<23:00:16, 12.22s/it] + 8%|▊ | 601/7378 [2:03:58<22:55:53, 12.18s/it] + +{'loss': 0.5527, 'learning_rate': 1.9861896292639274e-05, 'epoch': 0.08} + + 8%|▊ | 601/7378 [2:03:58<22:55:53, 12.18s/it] + 8%|▊ | 602/7378 [2:04:10<23:01:47, 12.24s/it] + +{'loss': 0.4577, 'learning_rate': 1.9861168244945314e-05, 'epoch': 0.08} + + 8%|▊ | 602/7378 [2:04:10<23:01:47, 12.24s/it] + 8%|▊ | 603/7378 [2:04:22<22:57:32, 12.20s/it] + +{'loss': 0.4401, 'learning_rate': 1.986043829666572e-05, 'epoch': 0.08} + + 8%|▊ | 603/7378 [2:04:22<22:57:32, 12.20s/it] + 8%|▊ | 604/7378 [2:04:34<22:54:25, 12.17s/it] + +{'loss': 0.5043, 'learning_rate': 1.985970644794117e-05, 'epoch': 0.08} + + 8%|▊ | 604/7378 [2:04:34<22:54:25, 12.17s/it] + 8%|▊ | 605/7378 [2:04:47<22:55:21, 12.18s/it] + +{'loss': 0.541, 'learning_rate': 1.985897269891272e-05, 'epoch': 0.08} + + 8%|▊ | 605/7378 [2:04:47<22:55:21, 12.18s/it] + 8%|▊ | 606/7378 [2:04:59<23:04:32, 12.27s/it] + +{'loss': 0.544, 'learning_rate': 1.9858237049721793e-05, 'epoch': 0.08} + + 8%|▊ | 606/7378 [2:04:59<23:04:32, 12.27s/it] + 8%|▊ | 607/7378 [2:05:12<23:18:24, 12.39s/it] + +{'loss': 0.5396, 'learning_rate': 1.9857499500510167e-05, 'epoch': 0.08} + + 8%|▊ | 607/7378 [2:05:12<23:18:24, 12.39s/it] + 8%|▊ | 608/7378 [2:05:24<23:18:15, 12.39s/it] + +{'loss': 0.5579, 'learning_rate': 1.9856760051419996e-05, 'epoch': 0.08} + + 8%|▊ | 608/7378 [2:05:24<23:18:15, 12.39s/it] + 8%|▊ | 609/7378 [2:05:36<23:13:02, 12.35s/it] + +{'loss': 0.5687, 'learning_rate': 1.98560187025938e-05, 'epoch': 0.08} + + 8%|▊ | 609/7378 [2:05:36<23:13:02, 12.35s/it] + 8%|▊ | 610/7378 [2:05:49<23:06:34, 12.29s/it] + +{'loss': 0.5003, 'learning_rate': 1.985527545417446e-05, 'epoch': 0.08} + + 8%|▊ | 610/7378 [2:05:49<23:06:34, 12.29s/it] + 8%|▊ | 611/7378 [2:06:01<22:57:23, 12.21s/it] + +{'loss': 0.508, 'learning_rate': 1.985453030630522e-05, 'epoch': 0.08} + + 8%|▊ | 611/7378 [2:06:01<22:57:23, 12.21s/it] + 8%|▊ | 612/7378 [2:06:13<22:52:45, 12.17s/it] + +{'loss': 0.5012, 'learning_rate': 1.9853783259129703e-05, 'epoch': 0.08} + + 8%|▊ | 612/7378 [2:06:13<22:52:45, 12.17s/it] + 8%|▊ | 613/7378 [2:06:25<22:57:48, 12.22s/it] + +{'loss': 0.4797, 'learning_rate': 1.985303431279189e-05, 'epoch': 0.08} + + 8%|▊ | 613/7378 [2:06:25<22:57:48, 12.22s/it] + 8%|▊ | 614/7378 [2:06:37<22:55:33, 12.20s/it] + +{'loss': 0.4618, 'learning_rate': 1.9852283467436124e-05, 'epoch': 0.08} + + 8%|▊ | 614/7378 [2:06:37<22:55:33, 12.20s/it] + 8%|▊ | 615/7378 [2:06:50<23:16:39, 12.39s/it] + +{'loss': 0.5278, 'learning_rate': 1.9851530723207125e-05, 'epoch': 0.08} + + 8%|▊ | 615/7378 [2:06:50<23:16:39, 12.39s/it] + 8%|▊ | 616/7378 [2:07:02<23:12:04, 12.35s/it] + +{'loss': 0.551, 'learning_rate': 1.9850776080249966e-05, 'epoch': 0.08} + + 8%|▊ | 616/7378 [2:07:02<23:12:04, 12.35s/it] + 8%|▊ | 617/7378 [2:07:15<23:13:42, 12.37s/it] + +{'loss': 0.5739, 'learning_rate': 1.9850019538710098e-05, 'epoch': 0.08} + + 8%|▊ | 617/7378 [2:07:15<23:13:42, 12.37s/it] + 8%|▊ | 618/7378 [2:07:27<22:59:57, 12.25s/it] + +{'loss': 0.4852, 'learning_rate': 1.984926109873333e-05, 'epoch': 0.08} + + 8%|▊ | 618/7378 [2:07:27<22:59:57, 12.25s/it] + 8%|▊ | 619/7378 [2:07:39<23:18:52, 12.42s/it] + +{'loss': 0.5371, 'learning_rate': 1.984850076046584e-05, 'epoch': 0.08} + + 8%|▊ | 619/7378 [2:07:39<23:18:52, 12.42s/it] + 8%|▊ | 620/7378 [2:07:52<23:30:22, 12.52s/it] + +{'loss': 0.4963, 'learning_rate': 1.9847738524054172e-05, 'epoch': 0.08} + + 8%|▊ | 620/7378 [2:07:52<23:30:22, 12.52s/it] + 8%|▊ | 621/7378 [2:08:05<23:29:10, 12.51s/it] + +{'loss': 0.5531, 'learning_rate': 1.9846974389645232e-05, 'epoch': 0.08} + + 8%|▊ | 621/7378 [2:08:05<23:29:10, 12.51s/it] + 8%|▊ | 622/7378 [2:08:17<23:22:41, 12.46s/it] + +{'loss': 0.4743, 'learning_rate': 1.98462083573863e-05, 'epoch': 0.08} + + 8%|▊ | 622/7378 [2:08:17<23:22:41, 12.46s/it] + 8%|▊ | 623/7378 [2:08:29<23:13:39, 12.38s/it] + +{'loss': 0.5673, 'learning_rate': 1.984544042742501e-05, 'epoch': 0.08} + + 8%|▊ | 623/7378 [2:08:29<23:13:39, 12.38s/it] + 8%|▊ | 624/7378 [2:08:41<23:08:29, 12.33s/it] + +{'loss': 0.5331, 'learning_rate': 1.9844670599909375e-05, 'epoch': 0.08} + + 8%|▊ | 624/7378 [2:08:41<23:08:29, 12.33s/it] + 8%|▊ | 625/7378 [2:08:54<23:09:12, 12.34s/it] + +{'loss': 0.53, 'learning_rate': 1.9843898874987765e-05, 'epoch': 0.08} + + 8%|▊ | 625/7378 [2:08:54<23:09:12, 12.34s/it] + 8%|▊ | 626/7378 [2:09:06<23:09:30, 12.35s/it] + +{'loss': 0.6037, 'learning_rate': 1.9843125252808914e-05, 'epoch': 0.08} + + 8%|▊ | 626/7378 [2:09:06<23:09:30, 12.35s/it] + 8%|▊ | 627/7378 [2:09:18<22:59:42, 12.26s/it] + +{'loss': 0.4565, 'learning_rate': 1.9842349733521932e-05, 'epoch': 0.08} + + 8%|▊ | 627/7378 [2:09:18<22:59:42, 12.26s/it] + 9%|▊ | 628/7378 [2:09:31<23:09:40, 12.35s/it] + +{'loss': 0.5326, 'learning_rate': 1.9841572317276285e-05, 'epoch': 0.09} + + 9%|▊ | 628/7378 [2:09:31<23:09:40, 12.35s/it] + 9%|▊ | 629/7378 [2:09:43<22:53:08, 12.21s/it] + +{'loss': 0.4958, 'learning_rate': 1.984079300422181e-05, 'epoch': 0.09} + + 9%|▊ | 629/7378 [2:09:43<22:53:08, 12.21s/it] + 9%|▊ | 630/7378 [2:09:55<22:53:43, 12.21s/it] + +{'loss': 0.6333, 'learning_rate': 1.9840011794508702e-05, 'epoch': 0.09} + + 9%|▊ | 630/7378 [2:09:55<22:53:43, 12.21s/it] + 9%|▊ | 631/7378 [2:10:07<22:54:18, 12.22s/it] + +{'loss': 0.5726, 'learning_rate': 1.983922868828753e-05, 'epoch': 0.09} + + 9%|▊ | 631/7378 [2:10:07<22:54:18, 12.22s/it] + 9%|▊ | 632/7378 [2:10:19<22:57:17, 12.25s/it] + +{'loss': 0.5534, 'learning_rate': 1.9838443685709228e-05, 'epoch': 0.09} + + 9%|▊ | 632/7378 [2:10:19<22:57:17, 12.25s/it] + 9%|▊ | 633/7378 [2:10:32<22:56:08, 12.24s/it] + +{'loss': 0.4552, 'learning_rate': 1.983765678692509e-05, 'epoch': 0.09} + + 9%|▊ | 633/7378 [2:10:32<22:56:08, 12.24s/it] + 9%|▊ | 634/7378 [2:10:44<22:56:17, 12.24s/it] + +{'loss': 0.5702, 'learning_rate': 1.9836867992086777e-05, 'epoch': 0.09} + + 9%|▊ | 634/7378 [2:10:44<22:56:17, 12.24s/it] + 9%|▊ | 635/7378 [2:10:56<22:59:12, 12.27s/it] + +{'loss': 0.5002, 'learning_rate': 1.983607730134632e-05, 'epoch': 0.09} + + 9%|▊ | 635/7378 [2:10:56<22:59:12, 12.27s/it] + 9%|▊ | 636/7378 [2:11:09<23:00:10, 12.28s/it] + +{'loss': 0.5223, 'learning_rate': 1.9835284714856115e-05, 'epoch': 0.09} + + 9%|▊ | 636/7378 [2:11:09<23:00:10, 12.28s/it] + 9%|▊ | 637/7378 [2:11:21<22:50:23, 12.20s/it] + +{'loss': 0.5005, 'learning_rate': 1.983449023276891e-05, 'epoch': 0.09} + + 9%|▊ | 637/7378 [2:11:21<22:50:23, 12.20s/it] + 9%|▊ | 638/7378 [2:11:33<23:01:59, 12.30s/it] + +{'loss': 0.5096, 'learning_rate': 1.983369385523784e-05, 'epoch': 0.09} + + 9%|▊ | 638/7378 [2:11:33<23:01:59, 12.30s/it] + 9%|▊ | 639/7378 [2:11:45<22:56:39, 12.26s/it] + +{'loss': 0.5334, 'learning_rate': 1.983289558241639e-05, 'epoch': 0.09} + + 9%|▊ | 639/7378 [2:11:45<22:56:39, 12.26s/it] + 9%|▊ | 640/7378 [2:11:58<22:58:32, 12.28s/it] + +{'loss': 0.5163, 'learning_rate': 1.9832095414458414e-05, 'epoch': 0.09} + + 9%|▊ | 640/7378 [2:11:58<22:58:32, 12.28s/it] + 9%|▊ | 641/7378 [2:12:10<22:56:25, 12.26s/it] + +{'loss': 0.5162, 'learning_rate': 1.9831293351518136e-05, 'epoch': 0.09} + + 9%|▊ | 641/7378 [2:12:10<22:56:25, 12.26s/it] + 9%|▊ | 642/7378 [2:12:22<23:01:27, 12.31s/it] + +{'loss': 0.6151, 'learning_rate': 1.9830489393750132e-05, 'epoch': 0.09} + + 9%|▊ | 642/7378 [2:12:22<23:01:27, 12.31s/it] + 9%|▊ | 643/7378 [2:12:35<23:08:18, 12.37s/it] + +{'loss': 0.4759, 'learning_rate': 1.982968354130936e-05, 'epoch': 0.09} + + 9%|▊ | 643/7378 [2:12:35<23:08:18, 12.37s/it] + 9%|▊ | 644/7378 [2:12:49<24:00:25, 12.83s/it] + +{'loss': 0.5156, 'learning_rate': 1.982887579435113e-05, 'epoch': 0.09} + + 9%|▊ | 644/7378 [2:12:49<24:00:25, 12.83s/it] + 9%|▊ | 645/7378 [2:13:01<23:36:43, 12.62s/it] + +{'loss': 0.45, 'learning_rate': 1.9828066153031133e-05, 'epoch': 0.09} + + 9%|▊ | 645/7378 [2:13:01<23:36:43, 12.62s/it] + 9%|▉ | 646/7378 [2:13:13<23:16:34, 12.45s/it] + +{'loss': 0.4755, 'learning_rate': 1.98272546175054e-05, 'epoch': 0.09} + + 9%|▉ | 646/7378 [2:13:13<23:16:34, 12.45s/it] + 9%|▉ | 647/7378 [2:13:26<23:24:40, 12.52s/it] + +{'loss': 0.4441, 'learning_rate': 1.9826441187930356e-05, 'epoch': 0.09} + + 9%|▉ | 647/7378 [2:13:26<23:24:40, 12.52s/it] + 9%|▉ | 648/7378 [2:13:38<23:10:22, 12.40s/it] + +{'loss': 0.5206, 'learning_rate': 1.982562586446276e-05, 'epoch': 0.09} + + 9%|▉ | 648/7378 [2:13:38<23:10:22, 12.40s/it] + 9%|▉ | 649/7378 [2:13:50<23:14:11, 12.43s/it] + +{'loss': 0.5172, 'learning_rate': 1.9824808647259775e-05, 'epoch': 0.09} + + 9%|▉ | 649/7378 [2:13:50<23:14:11, 12.43s/it] + 9%|▉ | 650/7378 [2:14:02<23:08:16, 12.38s/it] + +{'loss': 0.5692, 'learning_rate': 1.9823989536478887e-05, 'epoch': 0.09} + + 9%|▉ | 650/7378 [2:14:02<23:08:16, 12.38s/it] + 9%|▉ | 651/7378 [2:14:15<23:04:49, 12.35s/it] + +{'loss': 0.4886, 'learning_rate': 1.982316853227798e-05, 'epoch': 0.09} + + 9%|▉ | 651/7378 [2:14:15<23:04:49, 12.35s/it] + 9%|▉ | 652/7378 [2:14:27<23:08:06, 12.38s/it] + +{'loss': 0.477, 'learning_rate': 1.9822345634815278e-05, 'epoch': 0.09} + + 9%|▉ | 652/7378 [2:14:27<23:08:06, 12.38s/it] + 9%|▉ | 653/7378 [2:14:39<23:05:09, 12.36s/it] + +{'loss': 0.5578, 'learning_rate': 1.9821520844249388e-05, 'epoch': 0.09} + + 9%|▉ | 653/7378 [2:14:39<23:05:09, 12.36s/it] + 9%|▉ | 654/7378 [2:14:52<23:18:01, 12.47s/it] + +{'loss': 0.5714, 'learning_rate': 1.982069416073928e-05, 'epoch': 0.09} + + 9%|▉ | 654/7378 [2:14:52<23:18:01, 12.47s/it] + 9%|▉ | 655/7378 [2:15:04<23:09:40, 12.40s/it] + +{'loss': 0.5178, 'learning_rate': 1.9819865584444274e-05, 'epoch': 0.09} + + 9%|▉ | 655/7378 [2:15:04<23:09:40, 12.40s/it] + 9%|▉ | 656/7378 [2:15:16<22:48:00, 12.21s/it] + +{'loss': 0.5491, 'learning_rate': 1.9819035115524076e-05, 'epoch': 0.09} + + 9%|▉ | 656/7378 [2:15:16<22:48:00, 12.21s/it] + 9%|▉ | 657/7378 [2:15:29<23:02:26, 12.34s/it] + +{'loss': 0.4742, 'learning_rate': 1.9818202754138737e-05, 'epoch': 0.09} + + 9%|▉ | 657/7378 [2:15:29<23:02:26, 12.34s/it] + 9%|▉ | 658/7378 [2:15:41<22:57:29, 12.30s/it] + +{'loss': 0.5568, 'learning_rate': 1.9817368500448685e-05, 'epoch': 0.09} + + 9%|▉ | 658/7378 [2:15:41<22:57:29, 12.30s/it] + 9%|▉ | 659/7378 [2:15:53<22:55:44, 12.29s/it] + +{'loss': 0.4998, 'learning_rate': 1.981653235461471e-05, 'epoch': 0.09} + + 9%|▉ | 659/7378 [2:15:53<22:55:44, 12.29s/it] + 9%|▉ | 660/7378 [2:16:05<22:47:04, 12.21s/it] + +{'loss': 0.5551, 'learning_rate': 1.9815694316797967e-05, 'epoch': 0.09} + + 9%|▉ | 660/7378 [2:16:05<22:47:04, 12.21s/it] + 9%|▉ | 661/7378 [2:16:17<22:44:45, 12.19s/it] + +{'loss': 0.4814, 'learning_rate': 1.9814854387159973e-05, 'epoch': 0.09} + + 9%|▉ | 661/7378 [2:16:17<22:44:45, 12.19s/it] + 9%|▉ | 662/7378 [2:16:30<22:52:12, 12.26s/it] + +{'loss': 0.5383, 'learning_rate': 1.9814012565862607e-05, 'epoch': 0.09} + + 9%|▉ | 662/7378 [2:16:30<22:52:12, 12.26s/it] + 9%|▉ | 663/7378 [2:16:42<22:47:32, 12.22s/it] + +{'loss': 0.5112, 'learning_rate': 1.9813168853068126e-05, 'epoch': 0.09} + + 9%|▉ | 663/7378 [2:16:42<22:47:32, 12.22s/it] + 9%|▉ | 664/7378 [2:16:54<22:47:33, 12.22s/it] + +{'loss': 0.4744, 'learning_rate': 1.9812323248939134e-05, 'epoch': 0.09} + + 9%|▉ | 664/7378 [2:16:54<22:47:33, 12.22s/it] + 9%|▉ | 665/7378 [2:17:07<22:49:38, 12.24s/it] + +{'loss': 0.5238, 'learning_rate': 1.981147575363861e-05, 'epoch': 0.09} + + 9%|▉ | 665/7378 [2:17:07<22:49:38, 12.24s/it] + 9%|▉ | 666/7378 [2:17:19<22:43:55, 12.19s/it] + +{'loss': 0.5293, 'learning_rate': 1.9810626367329903e-05, 'epoch': 0.09} + + 9%|▉ | 666/7378 [2:17:19<22:43:55, 12.19s/it] + 9%|▉ | 667/7378 [2:17:31<22:49:22, 12.24s/it] + +{'loss': 0.5378, 'learning_rate': 1.980977509017671e-05, 'epoch': 0.09} + + 9%|▉ | 667/7378 [2:17:31<22:49:22, 12.24s/it] + 9%|▉ | 668/7378 [2:17:44<23:06:48, 12.40s/it] + +{'loss': 0.5695, 'learning_rate': 1.9808921922343104e-05, 'epoch': 0.09} + + 9%|▉ | 668/7378 [2:17:44<23:06:48, 12.40s/it] + 9%|▉ | 669/7378 [2:17:56<23:05:58, 12.40s/it] + +{'loss': 0.4942, 'learning_rate': 1.980806686399352e-05, 'epoch': 0.09} + + 9%|▉ | 669/7378 [2:17:56<23:05:58, 12.40s/it] + 9%|▉ | 670/7378 [2:18:08<22:56:20, 12.31s/it] + +{'loss': 0.5389, 'learning_rate': 1.9807209915292754e-05, 'epoch': 0.09} + + 9%|▉ | 670/7378 [2:18:08<22:56:20, 12.31s/it] + 9%|▉ | 671/7378 [2:18:20<22:50:55, 12.26s/it] + +{'loss': 0.602, 'learning_rate': 1.980635107640598e-05, 'epoch': 0.09} + + 9%|▉ | 671/7378 [2:18:20<22:50:55, 12.26s/it] + 9%|▉ | 672/7378 [2:18:33<22:51:59, 12.28s/it] + +{'loss': 0.4976, 'learning_rate': 1.980549034749871e-05, 'epoch': 0.09} + + 9%|▉ | 672/7378 [2:18:33<22:51:59, 12.28s/it] + 9%|▉ | 673/7378 [2:18:45<22:57:43, 12.33s/it] + +{'loss': 0.5523, 'learning_rate': 1.9804627728736848e-05, 'epoch': 0.09} + + 9%|▉ | 673/7378 [2:18:45<22:57:43, 12.33s/it] + 9%|▉ | 674/7378 [2:18:57<22:49:58, 12.26s/it] + +{'loss': 0.4978, 'learning_rate': 1.9803763220286646e-05, 'epoch': 0.09} + + 9%|▉ | 674/7378 [2:18:57<22:49:58, 12.26s/it] + 9%|▉ | 675/7378 [2:19:09<22:42:47, 12.20s/it] + +{'loss': 0.5217, 'learning_rate': 1.9802896822314726e-05, 'epoch': 0.09} + + 9%|▉ | 675/7378 [2:19:09<22:42:47, 12.20s/it] + 9%|▉ | 676/7378 [2:19:21<22:37:30, 12.15s/it] + +{'loss': 0.5016, 'learning_rate': 1.980202853498807e-05, 'epoch': 0.09} + + 9%|▉ | 676/7378 [2:19:21<22:37:30, 12.15s/it] + 9%|▉ | 677/7378 [2:19:34<22:50:42, 12.27s/it] + +{'loss': 0.5832, 'learning_rate': 1.9801158358474028e-05, 'epoch': 0.09} + + 9%|▉ | 677/7378 [2:19:34<22:50:42, 12.27s/it] + 9%|▉ | 678/7378 [2:19:47<23:03:18, 12.39s/it] + +{'loss': 0.473, 'learning_rate': 1.9800286292940313e-05, 'epoch': 0.09} + + 9%|▉ | 678/7378 [2:19:47<23:03:18, 12.39s/it] + 9%|▉ | 679/7378 [2:19:59<22:52:21, 12.29s/it] + +{'loss': 0.5931, 'learning_rate': 1.9799412338555005e-05, 'epoch': 0.09} + + 9%|▉ | 679/7378 [2:19:59<22:52:21, 12.29s/it] + 9%|▉ | 680/7378 [2:20:11<22:46:21, 12.24s/it] + +{'loss': 0.5529, 'learning_rate': 1.979853649548654e-05, 'epoch': 0.09} + + 9%|▉ | 680/7378 [2:20:11<22:46:21, 12.24s/it] + 9%|▉ | 681/7378 [2:20:23<22:43:25, 12.22s/it] + +{'loss': 0.5567, 'learning_rate': 1.9797658763903725e-05, 'epoch': 0.09} + + 9%|▉ | 681/7378 [2:20:23<22:43:25, 12.22s/it] + 9%|▉ | 682/7378 [2:20:35<22:42:20, 12.21s/it] + +{'loss': 0.5124, 'learning_rate': 1.9796779143975732e-05, 'epoch': 0.09} + + 9%|▉ | 682/7378 [2:20:35<22:42:20, 12.21s/it] + 9%|▉ | 683/7378 [2:20:47<22:43:11, 12.22s/it] + +{'loss': 0.4463, 'learning_rate': 1.9795897635872085e-05, 'epoch': 0.09} + + 9%|▉ | 683/7378 [2:20:47<22:43:11, 12.22s/it] + 9%|▉ | 684/7378 [2:21:00<22:43:37, 12.22s/it] + +{'loss': 0.5411, 'learning_rate': 1.9795014239762692e-05, 'epoch': 0.09} + + 9%|▉ | 684/7378 [2:21:00<22:43:37, 12.22s/it] + 9%|▉ | 685/7378 [2:21:12<22:40:56, 12.20s/it] + +{'loss': 0.495, 'learning_rate': 1.9794128955817806e-05, 'epoch': 0.09} + + 9%|▉ | 685/7378 [2:21:12<22:40:56, 12.20s/it] + 9%|▉ | 686/7378 [2:21:24<22:49:42, 12.28s/it] + +{'loss': 0.459, 'learning_rate': 1.9793241784208054e-05, 'epoch': 0.09} + + 9%|▉ | 686/7378 [2:21:24<22:49:42, 12.28s/it] + 9%|▉ | 687/7378 [2:21:36<22:48:28, 12.27s/it] + +{'loss': 0.5282, 'learning_rate': 1.979235272510443e-05, 'epoch': 0.09} + + 9%|▉ | 687/7378 [2:21:36<22:48:28, 12.27s/it] + 9%|▉ | 688/7378 [2:21:49<23:05:17, 12.42s/it] + +{'loss': 0.5431, 'learning_rate': 1.9791461778678278e-05, 'epoch': 0.09} + + 9%|▉ | 688/7378 [2:21:49<23:05:17, 12.42s/it] + 9%|▉ | 689/7378 [2:22:02<23:02:41, 12.40s/it] + +{'loss': 0.543, 'learning_rate': 1.9790568945101313e-05, 'epoch': 0.09} + + 9%|▉ | 689/7378 [2:22:02<23:02:41, 12.40s/it] + 9%|▉ | 690/7378 [2:22:13<22:41:57, 12.22s/it] + +{'loss': 0.4736, 'learning_rate': 1.9789674224545626e-05, 'epoch': 0.09} + + 9%|▉ | 690/7378 [2:22:13<22:41:57, 12.22s/it] + 9%|▉ | 691/7378 [2:22:26<22:44:32, 12.24s/it] + +{'loss': 0.4488, 'learning_rate': 1.978877761718365e-05, 'epoch': 0.09} + + 9%|▉ | 691/7378 [2:22:26<22:44:32, 12.24s/it] + 9%|▉ | 692/7378 [2:22:38<22:39:36, 12.20s/it] + +{'loss': 0.5492, 'learning_rate': 1.9787879123188193e-05, 'epoch': 0.09} + + 9%|▉ | 692/7378 [2:22:38<22:39:36, 12.20s/it] + 9%|▉ | 693/7378 [2:22:50<22:48:37, 12.28s/it] + +{'loss': 0.5264, 'learning_rate': 1.978697874273243e-05, 'epoch': 0.09} + + 9%|▉ | 693/7378 [2:22:50<22:48:37, 12.28s/it] + 9%|▉ | 694/7378 [2:23:02<22:39:58, 12.21s/it] + +{'loss': 0.4888, 'learning_rate': 1.978607647598989e-05, 'epoch': 0.09} + + 9%|▉ | 694/7378 [2:23:02<22:39:58, 12.21s/it] + 9%|▉ | 695/7378 [2:23:15<22:46:03, 12.26s/it] + +{'loss': 0.5295, 'learning_rate': 1.9785172323134475e-05, 'epoch': 0.09} + + 9%|▉ | 695/7378 [2:23:15<22:46:03, 12.26s/it] + 9%|▉ | 696/7378 [2:23:27<22:35:41, 12.17s/it] + +{'loss': 0.4698, 'learning_rate': 1.9784266284340446e-05, 'epoch': 0.09} + + 9%|▉ | 696/7378 [2:23:27<22:35:41, 12.17s/it] + 9%|▉ | 697/7378 [2:23:39<22:26:15, 12.09s/it] + +{'loss': 0.559, 'learning_rate': 1.9783358359782424e-05, 'epoch': 0.09} + + 9%|▉ | 697/7378 [2:23:39<22:26:15, 12.09s/it] + 9%|▉ | 698/7378 [2:23:52<23:06:25, 12.45s/it] + +{'loss': 0.5286, 'learning_rate': 1.9782448549635404e-05, 'epoch': 0.09} + + 9%|▉ | 698/7378 [2:23:52<23:06:25, 12.45s/it] + 9%|▉ | 699/7378 [2:24:05<23:15:53, 12.54s/it] + +{'loss': 0.545, 'learning_rate': 1.978153685407473e-05, 'epoch': 0.09} + + 9%|▉ | 699/7378 [2:24:05<23:15:53, 12.54s/it] + 9%|▉ | 700/7378 [2:24:17<23:12:30, 12.51s/it] + +{'loss': 0.5254, 'learning_rate': 1.9780623273276123e-05, 'epoch': 0.09} + + 9%|▉ | 700/7378 [2:24:17<23:12:30, 12.51s/it] + 10%|▉ | 701/7378 [2:24:30<23:13:06, 12.52s/it] + +{'loss': 0.4946, 'learning_rate': 1.9779707807415657e-05, 'epoch': 0.1} + + 10%|▉ | 701/7378 [2:24:30<23:13:06, 12.52s/it] + 10%|▉ | 702/7378 [2:24:42<23:07:32, 12.47s/it] + +{'loss': 0.6122, 'learning_rate': 1.9778790456669777e-05, 'epoch': 0.1} + + 10%|▉ | 702/7378 [2:24:42<23:07:32, 12.47s/it] + 10%|▉ | 703/7378 [2:24:54<22:59:51, 12.40s/it] + +{'loss': 0.478, 'learning_rate': 1.977787122121529e-05, 'epoch': 0.1} + + 10%|▉ | 703/7378 [2:24:54<22:59:51, 12.40s/it] + 10%|▉ | 704/7378 [2:25:07<23:11:20, 12.51s/it] + +{'loss': 0.495, 'learning_rate': 1.977695010122936e-05, 'epoch': 0.1} + + 10%|▉ | 704/7378 [2:25:07<23:11:20, 12.51s/it] + 10%|▉ | 705/7378 [2:25:19<22:58:13, 12.39s/it] + +{'loss': 0.5503, 'learning_rate': 1.9776027096889513e-05, 'epoch': 0.1} + + 10%|▉ | 705/7378 [2:25:19<22:58:13, 12.39s/it] + 10%|▉ | 706/7378 [2:25:31<22:55:53, 12.37s/it] + +{'loss': 0.5556, 'learning_rate': 1.9775102208373654e-05, 'epoch': 0.1} + + 10%|▉ | 706/7378 [2:25:31<22:55:53, 12.37s/it] + 10%|▉ | 707/7378 [2:25:44<23:13:50, 12.54s/it] + +{'loss': 0.58, 'learning_rate': 1.9774175435860037e-05, 'epoch': 0.1} + + 10%|▉ | 707/7378 [2:25:44<23:13:50, 12.54s/it] + 10%|▉ | 708/7378 [2:25:57<23:20:19, 12.60s/it] + +{'loss': 0.5101, 'learning_rate': 1.9773246779527282e-05, 'epoch': 0.1} + + 10%|▉ | 708/7378 [2:25:57<23:20:19, 12.60s/it] + 10%|▉ | 709/7378 [2:26:09<23:07:19, 12.48s/it] + +{'loss': 0.5081, 'learning_rate': 1.9772316239554376e-05, 'epoch': 0.1} + + 10%|▉ | 709/7378 [2:26:09<23:07:19, 12.48s/it] + 10%|▉ | 710/7378 [2:26:21<22:54:02, 12.36s/it] + +{'loss': 0.4965, 'learning_rate': 1.9771383816120658e-05, 'epoch': 0.1} + + 10%|▉ | 710/7378 [2:26:21<22:54:02, 12.36s/it] + 10%|▉ | 711/7378 [2:26:33<22:42:21, 12.26s/it] + +{'loss': 0.5113, 'learning_rate': 1.977044950940585e-05, 'epoch': 0.1} + + 10%|▉ | 711/7378 [2:26:33<22:42:21, 12.26s/it] + 10%|▉ | 712/7378 [2:26:46<22:53:31, 12.36s/it] + +{'loss': 0.539, 'learning_rate': 1.9769513319590013e-05, 'epoch': 0.1} + + 10%|▉ | 712/7378 [2:26:46<22:53:31, 12.36s/it] + 10%|▉ | 713/7378 [2:26:58<22:53:07, 12.36s/it] + +{'loss': 0.542, 'learning_rate': 1.976857524685359e-05, 'epoch': 0.1} + + 10%|▉ | 713/7378 [2:26:58<22:53:07, 12.36s/it] + 10%|▉ | 714/7378 [2:27:10<22:47:17, 12.31s/it] + +{'loss': 0.4713, 'learning_rate': 1.976763529137738e-05, 'epoch': 0.1} + + 10%|▉ | 714/7378 [2:27:11<22:47:17, 12.31s/it] + 10%|▉ | 715/7378 [2:27:23<22:48:19, 12.32s/it] + +{'loss': 0.5437, 'learning_rate': 1.9766693453342546e-05, 'epoch': 0.1} + + 10%|▉ | 715/7378 [2:27:23<22:48:19, 12.32s/it] + 10%|▉ | 716/7378 [2:27:35<22:54:04, 12.38s/it] + +{'loss': 0.5122, 'learning_rate': 1.9765749732930603e-05, 'epoch': 0.1} + + 10%|▉ | 716/7378 [2:27:35<22:54:04, 12.38s/it] + 10%|▉ | 717/7378 [2:27:48<23:08:27, 12.51s/it] + +{'loss': 0.5375, 'learning_rate': 1.976480413032345e-05, 'epoch': 0.1} + + 10%|▉ | 717/7378 [2:27:48<23:08:27, 12.51s/it] + 10%|▉ | 718/7378 [2:28:00<23:01:31, 12.45s/it] + +{'loss': 0.4804, 'learning_rate': 1.976385664570333e-05, 'epoch': 0.1} + + 10%|▉ | 718/7378 [2:28:00<23:01:31, 12.45s/it] + 10%|▉ | 719/7378 [2:28:13<22:54:53, 12.39s/it] + +{'loss': 0.5105, 'learning_rate': 1.9762907279252857e-05, 'epoch': 0.1} + + 10%|▉ | 719/7378 [2:28:13<22:54:53, 12.39s/it] + 10%|▉ | 720/7378 [2:28:25<22:54:47, 12.39s/it] + +{'loss': 0.5639, 'learning_rate': 1.9761956031155008e-05, 'epoch': 0.1} + + 10%|▉ | 720/7378 [2:28:25<22:54:47, 12.39s/it] + 10%|▉ | 721/7378 [2:28:37<22:39:35, 12.25s/it] + +{'loss': 0.5308, 'learning_rate': 1.976100290159312e-05, 'epoch': 0.1} + + 10%|▉ | 721/7378 [2:28:37<22:39:35, 12.25s/it] + 10%|▉ | 722/7378 [2:28:49<22:39:22, 12.25s/it] + +{'loss': 0.5587, 'learning_rate': 1.9760047890750895e-05, 'epoch': 0.1} + + 10%|▉ | 722/7378 [2:28:49<22:39:22, 12.25s/it] + 10%|▉ | 723/7378 [2:29:01<22:27:17, 12.15s/it] + +{'loss': 0.4946, 'learning_rate': 1.9759090998812393e-05, 'epoch': 0.1} + + 10%|▉ | 723/7378 [2:29:01<22:27:17, 12.15s/it] + 10%|▉ | 724/7378 [2:29:13<22:24:21, 12.12s/it] + +{'loss': 0.5104, 'learning_rate': 1.9758132225962045e-05, 'epoch': 0.1} + + 10%|▉ | 724/7378 [2:29:13<22:24:21, 12.12s/it] + 10%|▉ | 725/7378 [2:29:26<22:41:23, 12.28s/it] + +{'loss': 0.4991, 'learning_rate': 1.9757171572384637e-05, 'epoch': 0.1} + + 10%|▉ | 725/7378 [2:29:26<22:41:23, 12.28s/it] + 10%|▉ | 726/7378 [2:29:38<22:46:26, 12.33s/it] + +{'loss': 0.483, 'learning_rate': 1.9756209038265317e-05, 'epoch': 0.1} + + 10%|▉ | 726/7378 [2:29:38<22:46:26, 12.33s/it] + 10%|▉ | 727/7378 [2:29:51<22:49:19, 12.35s/it] + +{'loss': 0.5173, 'learning_rate': 1.9755244623789605e-05, 'epoch': 0.1} + + 10%|▉ | 727/7378 [2:29:51<22:49:19, 12.35s/it] + 10%|▉ | 728/7378 [2:30:03<22:42:31, 12.29s/it] + +{'loss': 0.5406, 'learning_rate': 1.975427832914337e-05, 'epoch': 0.1} + + 10%|▉ | 728/7378 [2:30:03<22:42:31, 12.29s/it] + 10%|▉ | 729/7378 [2:30:15<22:36:41, 12.24s/it] + +{'loss': 0.5986, 'learning_rate': 1.9753310154512853e-05, 'epoch': 0.1} + + 10%|▉ | 729/7378 [2:30:15<22:36:41, 12.24s/it] + 10%|▉ | 730/7378 [2:30:27<22:33:09, 12.21s/it] + +{'loss': 0.5717, 'learning_rate': 1.9752340100084658e-05, 'epoch': 0.1} + + 10%|▉ | 730/7378 [2:30:27<22:33:09, 12.21s/it] + 10%|▉ | 731/7378 [2:30:39<22:29:54, 12.19s/it] + +{'loss': 0.5237, 'learning_rate': 1.9751368166045743e-05, 'epoch': 0.1} + + 10%|▉ | 731/7378 [2:30:39<22:29:54, 12.19s/it] + 10%|▉ | 732/7378 [2:30:52<22:38:54, 12.27s/it] + +{'loss': 0.5242, 'learning_rate': 1.9750394352583434e-05, 'epoch': 0.1} + + 10%|▉ | 732/7378 [2:30:52<22:38:54, 12.27s/it] + 10%|▉ | 733/7378 [2:31:04<22:49:09, 12.36s/it] + +{'loss': 0.5016, 'learning_rate': 1.974941865988542e-05, 'epoch': 0.1} + + 10%|▉ | 733/7378 [2:31:04<22:49:09, 12.36s/it] + 10%|▉ | 734/7378 [2:31:17<23:08:36, 12.54s/it] + +{'loss': 0.5145, 'learning_rate': 1.9748441088139746e-05, 'epoch': 0.1} + + 10%|▉ | 734/7378 [2:31:17<23:08:36, 12.54s/it] + 10%|▉ | 735/7378 [2:31:30<23:06:52, 12.53s/it] + +{'loss': 0.4808, 'learning_rate': 1.9747461637534832e-05, 'epoch': 0.1} + + 10%|▉ | 735/7378 [2:31:30<23:06:52, 12.53s/it] + 10%|▉ | 736/7378 [2:31:42<22:44:11, 12.32s/it] + +{'loss': 0.594, 'learning_rate': 1.974648030825944e-05, 'epoch': 0.1} + + 10%|▉ | 736/7378 [2:31:42<22:44:11, 12.32s/it] + 10%|▉ | 737/7378 [2:31:54<22:35:36, 12.25s/it] + +{'loss': 0.5143, 'learning_rate': 1.9745497100502717e-05, 'epoch': 0.1} + + 10%|▉ | 737/7378 [2:31:54<22:35:36, 12.25s/it] + 10%|█ | 738/7378 [2:32:06<22:30:02, 12.20s/it] + +{'loss': 0.5217, 'learning_rate': 1.9744512014454153e-05, 'epoch': 0.1} + + 10%|█ | 738/7378 [2:32:06<22:30:02, 12.20s/it] + 10%|█ | 739/7378 [2:32:18<22:44:43, 12.33s/it] + +{'loss': 0.4873, 'learning_rate': 1.9743525050303613e-05, 'epoch': 0.1} + + 10%|█ | 739/7378 [2:32:18<22:44:43, 12.33s/it] + 10%|█ | 740/7378 [2:32:31<22:56:21, 12.44s/it] + +{'loss': 0.471, 'learning_rate': 1.974253620824132e-05, 'epoch': 0.1} + + 10%|█ | 740/7378 [2:32:31<22:56:21, 12.44s/it] + 10%|█ | 741/7378 [2:32:43<22:49:18, 12.38s/it] + +{'loss': 0.5033, 'learning_rate': 1.9741545488457853e-05, 'epoch': 0.1} + + 10%|█ | 741/7378 [2:32:43<22:49:18, 12.38s/it] + 10%|█ | 742/7378 [2:32:56<23:00:00, 12.48s/it] + +{'loss': 0.573, 'learning_rate': 1.9740552891144157e-05, 'epoch': 0.1} + + 10%|█ | 742/7378 [2:32:56<23:00:00, 12.48s/it] + 10%|█ | 743/7378 [2:33:08<22:51:41, 12.40s/it] + +{'loss': 0.5269, 'learning_rate': 1.9739558416491547e-05, 'epoch': 0.1} + + 10%|█ | 743/7378 [2:33:08<22:51:41, 12.40s/it] + 10%|█ | 744/7378 [2:33:20<22:39:22, 12.29s/it] + +{'loss': 0.5223, 'learning_rate': 1.973856206469168e-05, 'epoch': 0.1} + + 10%|█ | 744/7378 [2:33:20<22:39:22, 12.29s/it] + 10%|█ | 745/7378 [2:33:33<22:37:11, 12.28s/it] + +{'loss': 0.5553, 'learning_rate': 1.9737563835936603e-05, 'epoch': 0.1} + + 10%|█ | 745/7378 [2:33:33<22:37:11, 12.28s/it] + 10%|█ | 746/7378 [2:33:45<22:36:47, 12.27s/it] + +{'loss': 0.5386, 'learning_rate': 1.9736563730418695e-05, 'epoch': 0.1} + + 10%|█ | 746/7378 [2:33:45<22:36:47, 12.27s/it] + 10%|█ | 747/7378 [2:33:58<23:00:14, 12.49s/it] + +{'loss': 0.4785, 'learning_rate': 1.973556174833072e-05, 'epoch': 0.1} + + 10%|█ | 747/7378 [2:33:58<23:00:14, 12.49s/it] + 10%|█ | 748/7378 [2:34:10<22:53:08, 12.43s/it] + +{'loss': 0.5457, 'learning_rate': 1.9734557889865792e-05, 'epoch': 0.1} + + 10%|█ | 748/7378 [2:34:10<22:53:08, 12.43s/it] + 10%|█ | 749/7378 [2:34:22<22:32:46, 12.24s/it] + +{'loss': 0.4766, 'learning_rate': 1.9733552155217384e-05, 'epoch': 0.1} + + 10%|█ | 749/7378 [2:34:22<22:32:46, 12.24s/it] + 10%|█ | 750/7378 [2:34:34<22:42:34, 12.33s/it] + +{'loss': 0.5117, 'learning_rate': 1.973254454457934e-05, 'epoch': 0.1} + + 10%|█ | 750/7378 [2:34:34<22:42:34, 12.33s/it] + 10%|█ | 751/7378 [2:34:47<22:43:01, 12.34s/it] + +{'loss': 0.5045, 'learning_rate': 1.9731535058145862e-05, 'epoch': 0.1} + + 10%|█ | 751/7378 [2:34:47<22:43:01, 12.34s/it] + 10%|█ | 752/7378 [2:34:59<22:51:39, 12.42s/it] + +{'loss': 0.5376, 'learning_rate': 1.973052369611151e-05, 'epoch': 0.1} + + 10%|█ | 752/7378 [2:34:59<22:51:39, 12.42s/it] + 10%|█ | 753/7378 [2:35:12<22:52:16, 12.43s/it] + +{'loss': 0.5582, 'learning_rate': 1.972951045867121e-05, 'epoch': 0.1} + + 10%|█ | 753/7378 [2:35:12<22:52:16, 12.43s/it] + 10%|█ | 754/7378 [2:35:25<23:01:49, 12.52s/it] + +{'loss': 0.5289, 'learning_rate': 1.9728495346020246e-05, 'epoch': 0.1} + + 10%|█ | 754/7378 [2:35:25<23:01:49, 12.52s/it] + 10%|█ | 755/7378 [2:35:37<22:58:02, 12.48s/it] + +{'loss': 0.4748, 'learning_rate': 1.972747835835427e-05, 'epoch': 0.1} + + 10%|█ | 755/7378 [2:35:37<22:58:02, 12.48s/it] + 10%|█ | 756/7378 [2:35:49<22:49:50, 12.41s/it] + +{'loss': 0.4666, 'learning_rate': 1.9726459495869282e-05, 'epoch': 0.1} + + 10%|█ | 756/7378 [2:35:49<22:49:50, 12.41s/it] + 10%|█ | 757/7378 [2:36:02<22:49:15, 12.41s/it] + +{'loss': 0.4616, 'learning_rate': 1.9725438758761658e-05, 'epoch': 0.1} + + 10%|█ | 757/7378 [2:36:02<22:49:15, 12.41s/it] + 10%|█ | 758/7378 [2:36:14<22:37:12, 12.30s/it] + +{'loss': 0.4784, 'learning_rate': 1.9724416147228127e-05, 'epoch': 0.1} + + 10%|█ | 758/7378 [2:36:14<22:37:12, 12.30s/it] + 10%|█ | 759/7378 [2:36:26<22:41:54, 12.35s/it] + +{'loss': 0.5159, 'learning_rate': 1.972339166146578e-05, 'epoch': 0.1} + + 10%|█ | 759/7378 [2:36:26<22:41:54, 12.35s/it] + 10%|█ | 760/7378 [2:36:39<22:55:49, 12.47s/it] + +{'loss': 0.5345, 'learning_rate': 1.9722365301672072e-05, 'epoch': 0.1} + + 10%|█ | 760/7378 [2:36:39<22:55:49, 12.47s/it] + 10%|█ | 761/7378 [2:36:51<22:34:54, 12.29s/it] + +{'loss': 0.4801, 'learning_rate': 1.972133706804482e-05, 'epoch': 0.1} + + 10%|█ | 761/7378 [2:36:51<22:34:54, 12.29s/it] + 10%|█ | 762/7378 [2:37:03<22:31:54, 12.26s/it] + +{'loss': 0.505, 'learning_rate': 1.97203069607822e-05, 'epoch': 0.1} + + 10%|█ | 762/7378 [2:37:03<22:31:54, 12.26s/it] + 10%|█ | 763/7378 [2:37:16<22:39:58, 12.34s/it] + +{'loss': 0.5384, 'learning_rate': 1.9719274980082746e-05, 'epoch': 0.1} + + 10%|█ | 763/7378 [2:37:16<22:39:58, 12.34s/it] + 10%|█ | 764/7378 [2:37:28<22:50:47, 12.44s/it] + +{'loss': 0.5217, 'learning_rate': 1.9718241126145353e-05, 'epoch': 0.1} + + 10%|█ | 764/7378 [2:37:28<22:50:47, 12.44s/it] + 10%|█ | 765/7378 [2:37:40<22:34:25, 12.29s/it] + +{'loss': 0.4801, 'learning_rate': 1.971720539916929e-05, 'epoch': 0.1} + + 10%|█ | 765/7378 [2:37:40<22:34:25, 12.29s/it] + 10%|█ | 766/7378 [2:37:52<22:33:27, 12.28s/it] + +{'loss': 0.5289, 'learning_rate': 1.971616779935417e-05, 'epoch': 0.1} + + 10%|█ | 766/7378 [2:37:52<22:33:27, 12.28s/it] + 10%|█ | 767/7378 [2:38:05<22:31:54, 12.27s/it] + +{'loss': 0.4807, 'learning_rate': 1.9715128326899972e-05, 'epoch': 0.1} + + 10%|█ | 767/7378 [2:38:05<22:31:54, 12.27s/it] + 10%|█ | 768/7378 [2:38:17<22:31:17, 12.27s/it] + +{'loss': 0.5113, 'learning_rate': 1.9714086982007044e-05, 'epoch': 0.1} + + 10%|█ | 768/7378 [2:38:17<22:31:17, 12.27s/it] + 10%|█ | 769/7378 [2:38:29<22:37:59, 12.33s/it] + +{'loss': 0.4363, 'learning_rate': 1.9713043764876088e-05, 'epoch': 0.1} + + 10%|█ | 769/7378 [2:38:29<22:37:59, 12.33s/it] + 10%|█ | 770/7378 [2:38:41<22:25:51, 12.22s/it] + +{'loss': 0.5195, 'learning_rate': 1.9711998675708162e-05, 'epoch': 0.1} + + 10%|█ | 770/7378 [2:38:41<22:25:51, 12.22s/it] + 10%|█ | 771/7378 [2:38:54<22:36:15, 12.32s/it] + +{'loss': 0.5589, 'learning_rate': 1.9710951714704697e-05, 'epoch': 0.1} + + 10%|█ | 771/7378 [2:38:54<22:36:15, 12.32s/it] + 10%|█ | 772/7378 [2:39:06<22:41:31, 12.37s/it] + +{'loss': 0.4892, 'learning_rate': 1.9709902882067475e-05, 'epoch': 0.1} + + 10%|█ | 772/7378 [2:39:06<22:41:31, 12.37s/it] + 10%|█ | 773/7378 [2:39:20<23:08:55, 12.62s/it] + +{'loss': 0.5444, 'learning_rate': 1.9708852177998647e-05, 'epoch': 0.1} + + 10%|█ | 773/7378 [2:39:20<23:08:55, 12.62s/it] + 10%|█ | 774/7378 [2:39:32<23:04:28, 12.58s/it] + +{'loss': 0.4925, 'learning_rate': 1.9707799602700712e-05, 'epoch': 0.1} + + 10%|█ | 774/7378 [2:39:32<23:04:28, 12.58s/it] + 11%|█ | 775/7378 [2:39:44<22:55:29, 12.50s/it] + +{'loss': 0.5057, 'learning_rate': 1.9706745156376545e-05, 'epoch': 0.11} + + 11%|█ | 775/7378 [2:39:44<22:55:29, 12.50s/it] + 11%|█ | 776/7378 [2:39:58<23:18:43, 12.71s/it] + +{'loss': 0.533, 'learning_rate': 1.9705688839229365e-05, 'epoch': 0.11} + + 11%|█ | 776/7378 [2:39:58<23:18:43, 12.71s/it] + 11%|█ | 777/7378 [2:40:10<23:17:35, 12.70s/it] + +{'loss': 0.5205, 'learning_rate': 1.9704630651462767e-05, 'epoch': 0.11} + + 11%|█ | 777/7378 [2:40:10<23:17:35, 12.70s/it] + 11%|█ | 778/7378 [2:40:23<23:11:05, 12.65s/it] + +{'loss': 0.5154, 'learning_rate': 1.97035705932807e-05, 'epoch': 0.11} + + 11%|█ | 778/7378 [2:40:23<23:11:05, 12.65s/it] + 11%|█ | 779/7378 [2:40:35<23:09:19, 12.63s/it] + +{'loss': 0.517, 'learning_rate': 1.9702508664887475e-05, 'epoch': 0.11} + + 11%|█ | 779/7378 [2:40:35<23:09:19, 12.63s/it] + 11%|█ | 780/7378 [2:40:47<22:50:55, 12.47s/it] + +{'loss': 0.5181, 'learning_rate': 1.9701444866487757e-05, 'epoch': 0.11} + + 11%|█ | 780/7378 [2:40:47<22:50:55, 12.47s/it] + 11%|█ | 781/7378 [2:40:59<22:35:51, 12.33s/it] + +{'loss': 0.5244, 'learning_rate': 1.970037919828658e-05, 'epoch': 0.11} + + 11%|█ | 781/7378 [2:40:59<22:35:51, 12.33s/it] + 11%|█ | 782/7378 [2:41:12<22:54:02, 12.50s/it] + +{'loss': 0.5144, 'learning_rate': 1.9699311660489333e-05, 'epoch': 0.11} + + 11%|█ | 782/7378 [2:41:12<22:54:02, 12.50s/it] + 11%|█ | 783/7378 [2:41:24<22:39:18, 12.37s/it] + +{'loss': 0.547, 'learning_rate': 1.969824225330177e-05, 'epoch': 0.11} + + 11%|█ | 783/7378 [2:41:24<22:39:18, 12.37s/it] + 11%|█ | 784/7378 [2:41:37<22:39:56, 12.37s/it] + +{'loss': 0.561, 'learning_rate': 1.9697170976929996e-05, 'epoch': 0.11} + + 11%|█ | 784/7378 [2:41:37<22:39:56, 12.37s/it] + 11%|█ | 785/7378 [2:41:49<22:35:45, 12.34s/it] + +{'loss': 0.5145, 'learning_rate': 1.9696097831580492e-05, 'epoch': 0.11} + + 11%|█ | 785/7378 [2:41:49<22:35:45, 12.34s/it] + 11%|█ | 786/7378 [2:42:02<22:44:56, 12.42s/it] + +{'loss': 0.4444, 'learning_rate': 1.9695022817460083e-05, 'epoch': 0.11} + + 11%|█ | 786/7378 [2:42:02<22:44:56, 12.42s/it] + 11%|█ | 787/7378 [2:42:14<22:41:04, 12.39s/it] + +{'loss': 0.4575, 'learning_rate': 1.9693945934775966e-05, 'epoch': 0.11} + + 11%|█ | 787/7378 [2:42:14<22:41:04, 12.39s/it] + 11%|█ | 788/7378 [2:42:27<23:00:05, 12.57s/it] + +{'loss': 0.5828, 'learning_rate': 1.969286718373569e-05, 'epoch': 0.11} + + 11%|█ | 788/7378 [2:42:27<23:00:05, 12.57s/it] + 11%|█ | 789/7378 [2:42:39<22:52:22, 12.50s/it] + +{'loss': 0.5462, 'learning_rate': 1.9691786564547163e-05, 'epoch': 0.11} + + 11%|█ | 789/7378 [2:42:39<22:52:22, 12.50s/it] + 11%|█ | 790/7378 [2:42:51<22:40:12, 12.39s/it] + +{'loss': 0.4923, 'learning_rate': 1.969070407741867e-05, 'epoch': 0.11} + + 11%|█ | 790/7378 [2:42:51<22:40:12, 12.39s/it] + 11%|█ | 791/7378 [2:43:04<22:41:58, 12.41s/it] + +{'loss': 0.4615, 'learning_rate': 1.968961972255883e-05, 'epoch': 0.11} + + 11%|█ | 791/7378 [2:43:04<22:41:58, 12.41s/it] + 11%|█ | 792/7378 [2:43:17<22:50:40, 12.49s/it] + +{'loss': 0.4629, 'learning_rate': 1.9688533500176645e-05, 'epoch': 0.11} + + 11%|█ | 792/7378 [2:43:17<22:50:40, 12.49s/it] + 11%|█ | 793/7378 [2:43:29<22:51:01, 12.49s/it] + +{'loss': 0.55, 'learning_rate': 1.968744541048146e-05, 'epoch': 0.11} + + 11%|█ | 793/7378 [2:43:29<22:51:01, 12.49s/it] + 11%|█ | 794/7378 [2:43:41<22:38:41, 12.38s/it] + +{'loss': 0.4896, 'learning_rate': 1.9686355453682995e-05, 'epoch': 0.11} + + 11%|█ | 794/7378 [2:43:41<22:38:41, 12.38s/it] + 11%|█ | 795/7378 [2:43:54<22:43:33, 12.43s/it] + +{'loss': 0.4827, 'learning_rate': 1.9685263629991313e-05, 'epoch': 0.11} + + 11%|█ | 795/7378 [2:43:54<22:43:33, 12.43s/it] + 11%|█ | 796/7378 [2:44:06<22:50:03, 12.49s/it] + +{'loss': 0.4864, 'learning_rate': 1.9684169939616856e-05, 'epoch': 0.11} + + 11%|█ | 796/7378 [2:44:06<22:50:03, 12.49s/it] + 11%|█ | 797/7378 [2:44:18<22:33:32, 12.34s/it] + +{'loss': 0.5423, 'learning_rate': 1.9683074382770408e-05, 'epoch': 0.11} + + 11%|█ | 797/7378 [2:44:18<22:33:32, 12.34s/it] + 11%|█ | 798/7378 [2:44:30<22:26:51, 12.28s/it] + +{'loss': 0.4743, 'learning_rate': 1.968197695966312e-05, 'epoch': 0.11} + + 11%|█ | 798/7378 [2:44:30<22:26:51, 12.28s/it] + 11%|█ | 799/7378 [2:44:43<22:26:49, 12.28s/it] + +{'loss': 0.5134, 'learning_rate': 1.9680877670506507e-05, 'epoch': 0.11} + + 11%|█ | 799/7378 [2:44:43<22:26:49, 12.28s/it] + 11%|█ | 800/7378 [2:44:55<22:31:52, 12.33s/it] + +{'loss': 0.5138, 'learning_rate': 1.9679776515512443e-05, 'epoch': 0.11} + + 11%|█ | 800/7378 [2:44:55<22:31:52, 12.33s/it] + 11%|█ | 801/7378 [2:45:08<22:33:38, 12.35s/it] + +{'loss': 0.4675, 'learning_rate': 1.9678673494893153e-05, 'epoch': 0.11} + + 11%|█ | 801/7378 [2:45:08<22:33:38, 12.35s/it] + 11%|█ | 802/7378 [2:45:20<22:25:53, 12.28s/it] + +{'loss': 0.468, 'learning_rate': 1.9677568608861227e-05, 'epoch': 0.11} + + 11%|█ | 802/7378 [2:45:20<22:25:53, 12.28s/it] + 11%|█ | 803/7378 [2:45:33<22:45:19, 12.46s/it] + +{'loss': 0.5326, 'learning_rate': 1.9676461857629614e-05, 'epoch': 0.11} + + 11%|█ | 803/7378 [2:45:33<22:45:19, 12.46s/it] + 11%|█ | 804/7378 [2:45:45<22:48:05, 12.49s/it] + +{'loss': 0.5432, 'learning_rate': 1.9675353241411626e-05, 'epoch': 0.11} + + 11%|█ | 804/7378 [2:45:45<22:48:05, 12.49s/it] + 11%|█ | 805/7378 [2:45:57<22:40:00, 12.41s/it] + +{'loss': 0.5192, 'learning_rate': 1.967424276042093e-05, 'epoch': 0.11} + + 11%|█ | 805/7378 [2:45:57<22:40:00, 12.41s/it] + 11%|█ | 806/7378 [2:46:09<22:27:35, 12.30s/it] + +{'loss': 0.4911, 'learning_rate': 1.9673130414871556e-05, 'epoch': 0.11} + + 11%|█ | 806/7378 [2:46:09<22:27:35, 12.30s/it] + 11%|█ | 807/7378 [2:46:22<22:23:06, 12.26s/it] + +{'loss': 0.4924, 'learning_rate': 1.9672016204977885e-05, 'epoch': 0.11} + + 11%|█ | 807/7378 [2:46:22<22:23:06, 12.26s/it] + 11%|█ | 808/7378 [2:46:34<22:24:19, 12.28s/it] + +{'loss': 0.4639, 'learning_rate': 1.967090013095467e-05, 'epoch': 0.11} + + 11%|█ | 808/7378 [2:46:34<22:24:19, 12.28s/it] + 11%|█ | 809/7378 [2:46:46<22:21:27, 12.25s/it] + +{'loss': 0.5463, 'learning_rate': 1.966978219301701e-05, 'epoch': 0.11} + + 11%|█ | 809/7378 [2:46:46<22:21:27, 12.25s/it] + 11%|█ | 810/7378 [2:46:59<22:26:53, 12.30s/it] + +{'loss': 0.5927, 'learning_rate': 1.966866239138038e-05, 'epoch': 0.11} + + 11%|█ | 810/7378 [2:46:59<22:26:53, 12.30s/it] + 11%|█ | 811/7378 [2:47:11<22:33:00, 12.36s/it] + +{'loss': 0.4757, 'learning_rate': 1.9667540726260595e-05, 'epoch': 0.11} + + 11%|█ | 811/7378 [2:47:11<22:33:00, 12.36s/it] + 11%|█ | 812/7378 [2:47:23<22:30:11, 12.34s/it] + +{'loss': 0.4953, 'learning_rate': 1.966641719787384e-05, 'epoch': 0.11} + + 11%|█ | 812/7378 [2:47:23<22:30:11, 12.34s/it] + 11%|█ | 813/7378 [2:47:36<22:32:03, 12.36s/it] + +{'loss': 0.4504, 'learning_rate': 1.9665291806436662e-05, 'epoch': 0.11} + + 11%|█ | 813/7378 [2:47:36<22:32:03, 12.36s/it] + 11%|█ | 814/7378 [2:47:48<22:44:41, 12.47s/it] + +{'loss': 0.5738, 'learning_rate': 1.9664164552165957e-05, 'epoch': 0.11} + + 11%|█ | 814/7378 [2:47:48<22:44:41, 12.47s/it] + 11%|█ | 815/7378 [2:48:01<22:37:43, 12.41s/it] + +{'loss': 0.4878, 'learning_rate': 1.9663035435278994e-05, 'epoch': 0.11} + + 11%|█ | 815/7378 [2:48:01<22:37:43, 12.41s/it] + 11%|█ | 816/7378 [2:48:13<22:34:51, 12.39s/it] + +{'loss': 0.4401, 'learning_rate': 1.966190445599338e-05, 'epoch': 0.11} + + 11%|█ | 816/7378 [2:48:13<22:34:51, 12.39s/it] + 11%|█ | 817/7378 [2:48:26<22:48:02, 12.51s/it] + +{'loss': 0.5135, 'learning_rate': 1.9660771614527107e-05, 'epoch': 0.11} + + 11%|█ | 817/7378 [2:48:26<22:48:02, 12.51s/it] + 11%|█ | 818/7378 [2:48:38<22:37:11, 12.41s/it] + +{'loss': 0.5509, 'learning_rate': 1.9659636911098504e-05, 'epoch': 0.11} + + 11%|█ | 818/7378 [2:48:38<22:37:11, 12.41s/it] + 11%|█ | 819/7378 [2:48:51<22:47:34, 12.51s/it] + +{'loss': 0.5169, 'learning_rate': 1.965850034592627e-05, 'epoch': 0.11} + + 11%|█ | 819/7378 [2:48:51<22:47:34, 12.51s/it] + 11%|█ | 820/7378 [2:49:03<22:39:49, 12.44s/it] + +{'loss': 0.51, 'learning_rate': 1.9657361919229454e-05, 'epoch': 0.11} + + 11%|█ | 820/7378 [2:49:03<22:39:49, 12.44s/it] + 11%|█ | 821/7378 [2:49:16<22:40:40, 12.45s/it] + +{'loss': 0.491, 'learning_rate': 1.9656221631227483e-05, 'epoch': 0.11} + + 11%|█ | 821/7378 [2:49:16<22:40:40, 12.45s/it] + 11%|█ | 822/7378 [2:49:28<22:34:56, 12.40s/it] + +{'loss': 0.5726, 'learning_rate': 1.9655079482140115e-05, 'epoch': 0.11} + + 11%|█ | 822/7378 [2:49:28<22:34:56, 12.40s/it] + 11%|█ | 823/7378 [2:49:40<22:30:26, 12.36s/it] + +{'loss': 0.5032, 'learning_rate': 1.9653935472187492e-05, 'epoch': 0.11} + + 11%|█ | 823/7378 [2:49:40<22:30:26, 12.36s/it] + 11%|█ | 824/7378 [2:49:52<22:10:53, 12.18s/it] + +{'loss': 0.4943, 'learning_rate': 1.96527896015901e-05, 'epoch': 0.11} + + 11%|█ | 824/7378 [2:49:52<22:10:53, 12.18s/it] + 11%|█ | 825/7378 [2:50:04<22:18:40, 12.26s/it] + +{'loss': 0.4521, 'learning_rate': 1.9651641870568787e-05, 'epoch': 0.11} + + 11%|█ | 825/7378 [2:50:04<22:18:40, 12.26s/it] + 11%|█ | 826/7378 [2:50:16<22:11:21, 12.19s/it] + +{'loss': 0.5011, 'learning_rate': 1.965049227934476e-05, 'epoch': 0.11} + + 11%|█ | 826/7378 [2:50:16<22:11:21, 12.19s/it] + 11%|█ | 827/7378 [2:50:29<22:11:36, 12.20s/it] + +{'loss': 0.5613, 'learning_rate': 1.964934082813959e-05, 'epoch': 0.11} + + 11%|█ | 827/7378 [2:50:29<22:11:36, 12.20s/it] + 11%|█ | 828/7378 [2:50:41<22:12:32, 12.21s/it] + +{'loss': 0.5373, 'learning_rate': 1.964818751717519e-05, 'epoch': 0.11} + + 11%|█ | 828/7378 [2:50:41<22:12:32, 12.21s/it] + 11%|█ | 829/7378 [2:50:53<22:24:52, 12.32s/it] + +{'loss': 0.5049, 'learning_rate': 1.964703234667386e-05, 'epoch': 0.11} + + 11%|█ | 829/7378 [2:50:53<22:24:52, 12.32s/it] + 11%|█ | 830/7378 [2:51:06<22:27:08, 12.34s/it] + +{'loss': 0.5308, 'learning_rate': 1.964587531685822e-05, 'epoch': 0.11} + + 11%|█ | 830/7378 [2:51:06<22:27:08, 12.34s/it] + 11%|█▏ | 831/7378 [2:51:18<22:10:27, 12.19s/it] + +{'loss': 0.5014, 'learning_rate': 1.9644716427951286e-05, 'epoch': 0.11} + + 11%|█▏ | 831/7378 [2:51:18<22:10:27, 12.19s/it] + 11%|█▏ | 832/7378 [2:51:30<22:12:20, 12.21s/it] + +{'loss': 0.4191, 'learning_rate': 1.9643555680176408e-05, 'epoch': 0.11} + + 11%|█▏ | 832/7378 [2:51:30<22:12:20, 12.21s/it] + 11%|█▏ | 833/7378 [2:51:42<22:22:05, 12.30s/it] + +{'loss': 0.5336, 'learning_rate': 1.9642393073757302e-05, 'epoch': 0.11} + + 11%|█▏ | 833/7378 [2:51:42<22:22:05, 12.30s/it] + 11%|█▏ | 834/7378 [2:51:55<22:16:20, 12.25s/it] + +{'loss': 0.4227, 'learning_rate': 1.9641228608918044e-05, 'epoch': 0.11} + + 11%|█▏ | 834/7378 [2:51:55<22:16:20, 12.25s/it] + 11%|█▏ | 835/7378 [2:52:07<22:33:20, 12.41s/it] + +{'loss': 0.5167, 'learning_rate': 1.9640062285883067e-05, 'epoch': 0.11} + + 11%|█▏ | 835/7378 [2:52:07<22:33:20, 12.41s/it] + 11%|█▏ | 836/7378 [2:52:20<22:34:46, 12.43s/it] + +{'loss': 0.534, 'learning_rate': 1.963889410487716e-05, 'epoch': 0.11} + + 11%|█▏ | 836/7378 [2:52:20<22:34:46, 12.43s/it] + 11%|█▏ | 837/7378 [2:52:32<22:32:12, 12.40s/it] + +{'loss': 0.4927, 'learning_rate': 1.9637724066125473e-05, 'epoch': 0.11} + + 11%|█▏ | 837/7378 [2:52:32<22:32:12, 12.40s/it] + 11%|█▏ | 838/7378 [2:52:45<22:35:27, 12.44s/it] + +{'loss': 0.62, 'learning_rate': 1.9636552169853514e-05, 'epoch': 0.11} + + 11%|█▏ | 838/7378 [2:52:45<22:35:27, 12.44s/it] + 11%|█▏ | 839/7378 [2:52:57<22:31:10, 12.40s/it] + +{'loss': 0.476, 'learning_rate': 1.963537841628714e-05, 'epoch': 0.11} + + 11%|█▏ | 839/7378 [2:52:57<22:31:10, 12.40s/it] + 11%|█▏ | 840/7378 [2:53:09<22:16:53, 12.27s/it] + +{'loss': 0.4564, 'learning_rate': 1.9634202805652584e-05, 'epoch': 0.11} + + 11%|█▏ | 840/7378 [2:53:09<22:16:53, 12.27s/it] + 11%|█▏ | 841/7378 [2:53:21<22:15:09, 12.25s/it] + +{'loss': 0.5243, 'learning_rate': 1.963302533817642e-05, 'epoch': 0.11} + + 11%|█▏ | 841/7378 [2:53:21<22:15:09, 12.25s/it] + 11%|█▏ | 842/7378 [2:53:35<23:14:22, 12.80s/it] + +{'loss': 0.525, 'learning_rate': 1.9631846014085585e-05, 'epoch': 0.11} + + 11%|█▏ | 842/7378 [2:53:35<23:14:22, 12.80s/it] + 11%|█▏ | 843/7378 [2:53:47<22:49:05, 12.57s/it] + +{'loss': 0.4098, 'learning_rate': 1.9630664833607377e-05, 'epoch': 0.11} + + 11%|█▏ | 843/7378 [2:53:47<22:49:05, 12.57s/it] + 11%|█▏ | 844/7378 [2:54:00<22:39:50, 12.49s/it] + +{'loss': 0.457, 'learning_rate': 1.9629481796969455e-05, 'epoch': 0.11} + + 11%|█▏ | 844/7378 [2:54:00<22:39:50, 12.49s/it] + 11%|█▏ | 845/7378 [2:54:12<22:31:05, 12.41s/it] + +{'loss': 0.5027, 'learning_rate': 1.9628296904399828e-05, 'epoch': 0.11} + + 11%|█▏ | 845/7378 [2:54:12<22:31:05, 12.41s/it] + 11%|█▏ | 846/7378 [2:54:24<22:27:42, 12.38s/it] + +{'loss': 0.5214, 'learning_rate': 1.9627110156126862e-05, 'epoch': 0.11} + + 11%|█▏ | 846/7378 [2:54:24<22:27:42, 12.38s/it] + 11%|█▏ | 847/7378 [2:54:36<22:14:13, 12.26s/it] + +{'loss': 0.4812, 'learning_rate': 1.9625921552379288e-05, 'epoch': 0.11} + + 11%|█▏ | 847/7378 [2:54:36<22:14:13, 12.26s/it] + 11%|█▏ | 848/7378 [2:54:48<22:08:04, 12.20s/it] + +{'loss': 0.4829, 'learning_rate': 1.962473109338619e-05, 'epoch': 0.11} + + 11%|█▏ | 848/7378 [2:54:48<22:08:04, 12.20s/it] + 12%|█▏ | 849/7378 [2:55:00<22:03:46, 12.17s/it] + +{'loss': 0.5676, 'learning_rate': 1.9623538779377007e-05, 'epoch': 0.12} + + 12%|█▏ | 849/7378 [2:55:00<22:03:46, 12.17s/it] + 12%|█▏ | 850/7378 [2:55:12<21:55:57, 12.10s/it] + +{'loss': 0.5244, 'learning_rate': 1.9622344610581542e-05, 'epoch': 0.12} + + 12%|█▏ | 850/7378 [2:55:12<21:55:57, 12.10s/it] + 12%|█▏ | 851/7378 [2:55:25<22:08:38, 12.21s/it] + +{'loss': 0.5136, 'learning_rate': 1.9621148587229954e-05, 'epoch': 0.12} + + 12%|█▏ | 851/7378 [2:55:25<22:08:38, 12.21s/it] + 12%|█▏ | 852/7378 [2:55:37<22:14:12, 12.27s/it] + +{'loss': 0.5124, 'learning_rate': 1.961995070955275e-05, 'epoch': 0.12} + + 12%|█▏ | 852/7378 [2:55:37<22:14:12, 12.27s/it] + 12%|█▏ | 853/7378 [2:55:49<22:08:19, 12.21s/it] + +{'loss': 0.4847, 'learning_rate': 1.9618750977780813e-05, 'epoch': 0.12} + + 12%|█▏ | 853/7378 [2:55:49<22:08:19, 12.21s/it] + 12%|█▏ | 854/7378 [2:56:01<22:13:05, 12.26s/it] + +{'loss': 0.5481, 'learning_rate': 1.9617549392145365e-05, 'epoch': 0.12} + + 12%|█▏ | 854/7378 [2:56:01<22:13:05, 12.26s/it] + 12%|█▏ | 855/7378 [2:56:14<22:24:33, 12.37s/it] + +{'loss': 0.5388, 'learning_rate': 1.9616345952877998e-05, 'epoch': 0.12} + + 12%|█▏ | 855/7378 [2:56:14<22:24:33, 12.37s/it] + 12%|█▏ | 856/7378 [2:56:26<22:21:29, 12.34s/it] + +{'loss': 0.491, 'learning_rate': 1.961514066021065e-05, 'epoch': 0.12} + + 12%|█▏ | 856/7378 [2:56:26<22:21:29, 12.34s/it] + 12%|█▏ | 857/7378 [2:56:39<22:21:16, 12.34s/it] + +{'loss': 0.4945, 'learning_rate': 1.9613933514375623e-05, 'epoch': 0.12} + + 12%|█▏ | 857/7378 [2:56:39<22:21:16, 12.34s/it] + 12%|█▏ | 858/7378 [2:56:51<22:27:30, 12.40s/it] + +{'loss': 0.4761, 'learning_rate': 1.9612724515605582e-05, 'epoch': 0.12} + + 12%|█▏ | 858/7378 [2:56:51<22:27:30, 12.40s/it] + 12%|█▏ | 859/7378 [2:57:04<22:36:40, 12.49s/it] + +{'loss': 0.5247, 'learning_rate': 1.9611513664133535e-05, 'epoch': 0.12} + + 12%|█▏ | 859/7378 [2:57:04<22:36:40, 12.49s/it] + 12%|█▏ | 860/7378 [2:57:16<22:37:46, 12.50s/it] + +{'loss': 0.4763, 'learning_rate': 1.9610300960192864e-05, 'epoch': 0.12} + + 12%|█▏ | 860/7378 [2:57:16<22:37:46, 12.50s/it] + 12%|█▏ | 861/7378 [2:57:29<22:33:11, 12.46s/it] + +{'loss': 0.5436, 'learning_rate': 1.9609086404017287e-05, 'epoch': 0.12} + + 12%|█▏ | 861/7378 [2:57:29<22:33:11, 12.46s/it] + 12%|█▏ | 862/7378 [2:57:42<22:44:36, 12.57s/it] + +{'loss': 0.5503, 'learning_rate': 1.96078699958409e-05, 'epoch': 0.12} + + 12%|█▏ | 862/7378 [2:57:42<22:44:36, 12.57s/it] + 12%|█▏ | 863/7378 [2:57:55<22:58:25, 12.69s/it] + +{'loss': 0.5224, 'learning_rate': 1.9606651735898138e-05, 'epoch': 0.12} + + 12%|█▏ | 863/7378 [2:57:55<22:58:25, 12.69s/it] + 12%|█▏ | 864/7378 [2:58:07<22:52:51, 12.65s/it] + +{'loss': 0.4399, 'learning_rate': 1.960543162442381e-05, 'epoch': 0.12} + + 12%|█▏ | 864/7378 [2:58:07<22:52:51, 12.65s/it] + 12%|█▏ | 865/7378 [2:58:20<22:46:07, 12.59s/it] + +{'loss': 0.4914, 'learning_rate': 1.9604209661653067e-05, 'epoch': 0.12} + + 12%|█▏ | 865/7378 [2:58:20<22:46:07, 12.59s/it] + 12%|█▏ | 866/7378 [2:58:32<22:34:11, 12.48s/it] + +{'loss': 0.4888, 'learning_rate': 1.960298584782143e-05, 'epoch': 0.12} + + 12%|█▏ | 866/7378 [2:58:32<22:34:11, 12.48s/it] + 12%|█▏ | 867/7378 [2:58:44<22:27:25, 12.42s/it] + +{'loss': 0.4346, 'learning_rate': 1.9601760183164762e-05, 'epoch': 0.12} + + 12%|█▏ | 867/7378 [2:58:44<22:27:25, 12.42s/it] + 12%|█▏ | 868/7378 [2:58:56<22:19:12, 12.34s/it] + +{'loss': 0.4916, 'learning_rate': 1.96005326679193e-05, 'epoch': 0.12} + + 12%|█▏ | 868/7378 [2:58:56<22:19:12, 12.34s/it] + 12%|█▏ | 869/7378 [2:59:08<22:14:56, 12.31s/it] + +{'loss': 0.5317, 'learning_rate': 1.9599303302321616e-05, 'epoch': 0.12} + + 12%|█▏ | 869/7378 [2:59:08<22:14:56, 12.31s/it] + 12%|█▏ | 870/7378 [2:59:21<22:13:57, 12.30s/it] + +{'loss': 0.532, 'learning_rate': 1.9598072086608663e-05, 'epoch': 0.12} + + 12%|█▏ | 870/7378 [2:59:21<22:13:57, 12.30s/it] + 12%|█▏ | 871/7378 [2:59:33<22:11:18, 12.28s/it] + +{'loss': 0.5114, 'learning_rate': 1.9596839021017732e-05, 'epoch': 0.12} + + 12%|█▏ | 871/7378 [2:59:33<22:11:18, 12.28s/it] + 12%|█▏ | 872/7378 [2:59:46<22:20:31, 12.36s/it] + +{'loss': 0.5279, 'learning_rate': 1.9595604105786477e-05, 'epoch': 0.12} + + 12%|█▏ | 872/7378 [2:59:46<22:20:31, 12.36s/it] + 12%|█▏ | 873/7378 [2:59:58<22:15:27, 12.32s/it] + +{'loss': 0.5417, 'learning_rate': 1.959436734115291e-05, 'epoch': 0.12} + + 12%|█▏ | 873/7378 [2:59:58<22:15:27, 12.32s/it] + 12%|█▏ | 874/7378 [3:00:10<22:22:25, 12.38s/it] + +{'loss': 0.5314, 'learning_rate': 1.9593128727355398e-05, 'epoch': 0.12} + + 12%|█▏ | 874/7378 [3:00:10<22:22:25, 12.38s/it] + 12%|█▏ | 875/7378 [3:00:23<22:30:42, 12.46s/it] + +{'loss': 0.4784, 'learning_rate': 1.9591888264632664e-05, 'epoch': 0.12} + + 12%|█▏ | 875/7378 [3:00:23<22:30:42, 12.46s/it] + 12%|█▏ | 876/7378 [3:00:35<22:26:04, 12.42s/it] + +{'loss': 0.4529, 'learning_rate': 1.9590645953223792e-05, 'epoch': 0.12} + + 12%|█▏ | 876/7378 [3:00:35<22:26:04, 12.42s/it] + 12%|█▏ | 877/7378 [3:00:48<22:35:41, 12.51s/it] + +{'loss': 0.5764, 'learning_rate': 1.958940179336821e-05, 'epoch': 0.12} + + 12%|█▏ | 877/7378 [3:00:48<22:35:41, 12.51s/it] + 12%|█▏ | 878/7378 [3:01:00<22:18:44, 12.36s/it] + +{'loss': 0.459, 'learning_rate': 1.958815578530572e-05, 'epoch': 0.12} + + 12%|█▏ | 878/7378 [3:01:00<22:18:44, 12.36s/it] + 12%|█▏ | 879/7378 [3:01:12<22:12:55, 12.31s/it] + +{'loss': 0.5352, 'learning_rate': 1.9586907929276458e-05, 'epoch': 0.12} + + 12%|█▏ | 879/7378 [3:01:12<22:12:55, 12.31s/it] + 12%|█▏ | 880/7378 [3:01:25<22:15:52, 12.33s/it] + +{'loss': 0.5059, 'learning_rate': 1.958565822552094e-05, 'epoch': 0.12} + + 12%|█▏ | 880/7378 [3:01:25<22:15:52, 12.33s/it] + 12%|█▏ | 881/7378 [3:01:37<22:09:38, 12.28s/it] + +{'loss': 0.4832, 'learning_rate': 1.958440667428002e-05, 'epoch': 0.12} + + 12%|█▏ | 881/7378 [3:01:37<22:09:38, 12.28s/it] + 12%|█▏ | 882/7378 [3:01:49<22:04:39, 12.24s/it] + +{'loss': 0.4946, 'learning_rate': 1.958315327579492e-05, 'epoch': 0.12} + + 12%|█▏ | 882/7378 [3:01:49<22:04:39, 12.24s/it] + 12%|█▏ | 883/7378 [3:02:01<21:59:27, 12.19s/it] + +{'loss': 0.5379, 'learning_rate': 1.958189803030721e-05, 'epoch': 0.12} + + 12%|█▏ | 883/7378 [3:02:01<21:59:27, 12.19s/it] + 12%|█▏ | 884/7378 [3:02:14<22:14:13, 12.33s/it] + +{'loss': 0.5271, 'learning_rate': 1.9580640938058817e-05, 'epoch': 0.12} + + 12%|█▏ | 884/7378 [3:02:14<22:14:13, 12.33s/it] + 12%|█▏ | 885/7378 [3:02:26<22:15:12, 12.34s/it] + +{'loss': 0.4676, 'learning_rate': 1.957938199929203e-05, 'epoch': 0.12} + + 12%|█▏ | 885/7378 [3:02:26<22:15:12, 12.34s/it] + 12%|█▏ | 886/7378 [3:02:38<22:16:00, 12.35s/it] + +{'loss': 0.4788, 'learning_rate': 1.9578121214249485e-05, 'epoch': 0.12} + + 12%|█▏ | 886/7378 [3:02:38<22:16:00, 12.35s/it] + 12%|█▏ | 887/7378 [3:02:50<22:04:59, 12.25s/it] + +{'loss': 0.5276, 'learning_rate': 1.9576858583174185e-05, 'epoch': 0.12} + + 12%|█▏ | 887/7378 [3:02:50<22:04:59, 12.25s/it] + 12%|█▏ | 888/7378 [3:03:03<22:09:02, 12.29s/it] + +{'loss': 0.469, 'learning_rate': 1.957559410630948e-05, 'epoch': 0.12} + + 12%|█▏ | 888/7378 [3:03:03<22:09:02, 12.29s/it] + 12%|█▏ | 889/7378 [3:03:15<21:59:57, 12.20s/it] + +{'loss': 0.5315, 'learning_rate': 1.9574327783899073e-05, 'epoch': 0.12} + + 12%|█▏ | 889/7378 [3:03:15<21:59:57, 12.20s/it] + 12%|█▏ | 890/7378 [3:03:27<21:53:01, 12.14s/it] + +{'loss': 0.4795, 'learning_rate': 1.9573059616187035e-05, 'epoch': 0.12} + + 12%|█▏ | 890/7378 [3:03:27<21:53:01, 12.14s/it] + 12%|█▏ | 891/7378 [3:03:39<22:04:03, 12.25s/it] + +{'loss': 0.4831, 'learning_rate': 1.9571789603417775e-05, 'epoch': 0.12} + + 12%|█▏ | 891/7378 [3:03:39<22:04:03, 12.25s/it] + 12%|█▏ | 892/7378 [3:03:51<22:04:19, 12.25s/it] + +{'loss': 0.4955, 'learning_rate': 1.9570517745836083e-05, 'epoch': 0.12} + + 12%|█▏ | 892/7378 [3:03:51<22:04:19, 12.25s/it] + 12%|█▏ | 893/7378 [3:04:04<22:04:20, 12.25s/it] + +{'loss': 0.5416, 'learning_rate': 1.9569244043687075e-05, 'epoch': 0.12} + + 12%|█▏ | 893/7378 [3:04:04<22:04:20, 12.25s/it] + 12%|█▏ | 894/7378 [3:04:16<22:02:42, 12.24s/it] + +{'loss': 0.5114, 'learning_rate': 1.956796849721625e-05, 'epoch': 0.12} + + 12%|█▏ | 894/7378 [3:04:16<22:02:42, 12.24s/it] + 12%|█▏ | 895/7378 [3:04:29<22:19:13, 12.39s/it] + +{'loss': 0.4765, 'learning_rate': 1.956669110666944e-05, 'epoch': 0.12} + + 12%|█▏ | 895/7378 [3:04:29<22:19:13, 12.39s/it] + 12%|█▏ | 896/7378 [3:04:42<22:33:53, 12.53s/it] + +{'loss': 0.5422, 'learning_rate': 1.9565411872292846e-05, 'epoch': 0.12} + + 12%|█▏ | 896/7378 [3:04:42<22:33:53, 12.53s/it] + 12%|█▏ | 897/7378 [3:04:54<22:35:21, 12.55s/it] + +{'loss': 0.4897, 'learning_rate': 1.9564130794333024e-05, 'epoch': 0.12} + + 12%|█▏ | 897/7378 [3:04:54<22:35:21, 12.55s/it] + 12%|█▏ | 898/7378 [3:05:07<22:40:16, 12.60s/it] + +{'loss': 0.5616, 'learning_rate': 1.956284787303687e-05, 'epoch': 0.12} + + 12%|█▏ | 898/7378 [3:05:07<22:40:16, 12.60s/it] + 12%|█▏ | 899/7378 [3:05:19<22:40:16, 12.60s/it] + +{'loss': 0.4505, 'learning_rate': 1.956156310865166e-05, 'epoch': 0.12} + + 12%|█▏ | 899/7378 [3:05:19<22:40:16, 12.60s/it] + 12%|█▏ | 900/7378 [3:05:32<22:23:49, 12.45s/it] + +{'loss': 0.4272, 'learning_rate': 1.9560276501425003e-05, 'epoch': 0.12} + + 12%|█▏ | 900/7378 [3:05:32<22:23:49, 12.45s/it] + 12%|█▏ | 901/7378 [3:05:44<22:12:23, 12.34s/it] + +{'loss': 0.5481, 'learning_rate': 1.955898805160488e-05, 'epoch': 0.12} + + 12%|█▏ | 901/7378 [3:05:44<22:12:23, 12.34s/it] + 12%|█▏ | 902/7378 [3:05:56<22:16:30, 12.38s/it] + +{'loss': 0.4923, 'learning_rate': 1.9557697759439613e-05, 'epoch': 0.12} + + 12%|█▏ | 902/7378 [3:05:56<22:16:30, 12.38s/it] + 12%|█▏ | 903/7378 [3:06:09<22:32:50, 12.54s/it] + +{'loss': 0.4705, 'learning_rate': 1.9556405625177886e-05, 'epoch': 0.12} + + 12%|█▏ | 903/7378 [3:06:09<22:32:50, 12.54s/it] + 12%|█▏ | 904/7378 [3:06:21<22:21:21, 12.43s/it] + +{'loss': 0.508, 'learning_rate': 1.9555111649068746e-05, 'epoch': 0.12} + + 12%|█▏ | 904/7378 [3:06:21<22:21:21, 12.43s/it] + 12%|█▏ | 905/7378 [3:06:33<22:06:29, 12.30s/it] + +{'loss': 0.5056, 'learning_rate': 1.9553815831361577e-05, 'epoch': 0.12} + + 12%|█▏ | 905/7378 [3:06:33<22:06:29, 12.30s/it] + 12%|█▏ | 906/7378 [3:06:46<22:16:17, 12.39s/it] + +{'loss': 0.5429, 'learning_rate': 1.955251817230613e-05, 'epoch': 0.12} + + 12%|█▏ | 906/7378 [3:06:46<22:16:17, 12.39s/it] + 12%|█▏ | 907/7378 [3:06:58<22:11:05, 12.34s/it] + +{'loss': 0.4596, 'learning_rate': 1.955121867215251e-05, 'epoch': 0.12} + + 12%|█▏ | 907/7378 [3:06:58<22:11:05, 12.34s/it] + 12%|█▏ | 908/7378 [3:07:11<22:16:17, 12.39s/it] + +{'loss': 0.5574, 'learning_rate': 1.9549917331151177e-05, 'epoch': 0.12} + + 12%|█▏ | 908/7378 [3:07:11<22:16:17, 12.39s/it] + 12%|█▏ | 909/7378 [3:07:23<22:16:54, 12.40s/it] + +{'loss': 0.4535, 'learning_rate': 1.954861414955294e-05, 'epoch': 0.12} + + 12%|█▏ | 909/7378 [3:07:23<22:16:54, 12.40s/it] + 12%|█▏ | 910/7378 [3:07:35<22:17:27, 12.41s/it] + +{'loss': 0.5252, 'learning_rate': 1.954730912760897e-05, 'epoch': 0.12} + + 12%|█▏ | 910/7378 [3:07:35<22:17:27, 12.41s/it] + 12%|█▏ | 911/7378 [3:07:48<22:08:42, 12.33s/it] + +{'loss': 0.5082, 'learning_rate': 1.9546002265570786e-05, 'epoch': 0.12} + + 12%|█▏ | 911/7378 [3:07:48<22:08:42, 12.33s/it] + 12%|█▏ | 912/7378 [3:08:00<22:23:07, 12.46s/it] + +{'loss': 0.4816, 'learning_rate': 1.9544693563690266e-05, 'epoch': 0.12} + + 12%|█▏ | 912/7378 [3:08:00<22:23:07, 12.46s/it] + 12%|█▏ | 913/7378 [3:08:12<22:14:57, 12.39s/it] + +{'loss': 0.4958, 'learning_rate': 1.9543383022219646e-05, 'epoch': 0.12} + + 12%|█▏ | 913/7378 [3:08:13<22:14:57, 12.39s/it] + 12%|█▏ | 914/7378 [3:08:24<22:00:01, 12.25s/it] + +{'loss': 0.4838, 'learning_rate': 1.954207064141151e-05, 'epoch': 0.12} + + 12%|█▏ | 914/7378 [3:08:24<22:00:01, 12.25s/it] + 12%|█▏ | 915/7378 [3:08:37<21:59:01, 12.25s/it] + +{'loss': 0.5086, 'learning_rate': 1.9540756421518798e-05, 'epoch': 0.12} + + 12%|█▏ | 915/7378 [3:08:37<21:59:01, 12.25s/it] + 12%|█▏ | 916/7378 [3:08:49<22:00:22, 12.26s/it] + +{'loss': 0.5146, 'learning_rate': 1.9539440362794803e-05, 'epoch': 0.12} + + 12%|█▏ | 916/7378 [3:08:49<22:00:22, 12.26s/it] + 12%|█▏ | 917/7378 [3:09:01<21:54:33, 12.21s/it] + +{'loss': 0.5045, 'learning_rate': 1.953812246549318e-05, 'epoch': 0.12} + + 12%|█▏ | 917/7378 [3:09:01<21:54:33, 12.21s/it] + 12%|█▏ | 918/7378 [3:09:13<21:43:17, 12.10s/it] + +{'loss': 0.4995, 'learning_rate': 1.9536802729867926e-05, 'epoch': 0.12} + + 12%|█▏ | 918/7378 [3:09:13<21:43:17, 12.10s/it] + 12%|█▏ | 919/7378 [3:09:25<21:49:27, 12.16s/it] + +{'loss': 0.4631, 'learning_rate': 1.9535481156173408e-05, 'epoch': 0.12} + + 12%|█▏ | 919/7378 [3:09:25<21:49:27, 12.16s/it] + 12%|█▏ | 920/7378 [3:09:38<22:02:13, 12.28s/it] + +{'loss': 0.4931, 'learning_rate': 1.9534157744664336e-05, 'epoch': 0.12} + + 12%|█▏ | 920/7378 [3:09:38<22:02:13, 12.28s/it] + 12%|█▏ | 921/7378 [3:09:50<22:02:57, 12.29s/it] + +{'loss': 0.5531, 'learning_rate': 1.953283249559577e-05, 'epoch': 0.12} + + 12%|█▏ | 921/7378 [3:09:50<22:02:57, 12.29s/it] + 12%|█▏ | 922/7378 [3:10:02<21:55:38, 12.23s/it] + +{'loss': 0.506, 'learning_rate': 1.9531505409223143e-05, 'epoch': 0.12} + + 12%|█▏ | 922/7378 [3:10:02<21:55:38, 12.23s/it] + 13%|█▎ | 923/7378 [3:10:14<21:50:32, 12.18s/it] + +{'loss': 0.4392, 'learning_rate': 1.9530176485802217e-05, 'epoch': 0.13} + + 13%|█▎ | 923/7378 [3:10:14<21:50:32, 12.18s/it] + 13%|█▎ | 924/7378 [3:10:26<21:45:21, 12.14s/it] + +{'loss': 0.4522, 'learning_rate': 1.9528845725589126e-05, 'epoch': 0.13} + + 13%|█▎ | 924/7378 [3:10:26<21:45:21, 12.14s/it] + 13%|█▎ | 925/7378 [3:10:39<22:08:04, 12.35s/it] + +{'loss': 0.554, 'learning_rate': 1.952751312884036e-05, 'epoch': 0.13} + + 13%|█▎ | 925/7378 [3:10:39<22:08:04, 12.35s/it] + 13%|█▎ | 926/7378 [3:10:51<21:52:59, 12.21s/it] + +{'loss': 0.5238, 'learning_rate': 1.9526178695812747e-05, 'epoch': 0.13} + + 13%|█▎ | 926/7378 [3:10:51<21:52:59, 12.21s/it] + 13%|█▎ | 927/7378 [3:11:03<22:01:56, 12.30s/it] + +{'loss': 0.5863, 'learning_rate': 1.9524842426763484e-05, 'epoch': 0.13} + + 13%|█▎ | 927/7378 [3:11:03<22:01:56, 12.30s/it] + 13%|█▎ | 928/7378 [3:11:16<22:00:35, 12.28s/it] + +{'loss': 0.5208, 'learning_rate': 1.9523504321950113e-05, 'epoch': 0.13} + + 13%|█▎ | 928/7378 [3:11:16<22:00:35, 12.28s/it] + 13%|█▎ | 929/7378 [3:11:28<22:03:28, 12.31s/it] + +{'loss': 0.5226, 'learning_rate': 1.9522164381630534e-05, 'epoch': 0.13} + + 13%|█▎ | 929/7378 [3:11:28<22:03:28, 12.31s/it] + 13%|█▎ | 930/7378 [3:11:40<21:57:29, 12.26s/it] + +{'loss': 0.5738, 'learning_rate': 1.9520822606063e-05, 'epoch': 0.13} + + 13%|█▎ | 930/7378 [3:11:40<21:57:29, 12.26s/it] + 13%|█▎ | 931/7378 [3:11:53<22:09:19, 12.37s/it] + +{'loss': 0.5461, 'learning_rate': 1.9519478995506112e-05, 'epoch': 0.13} + + 13%|█▎ | 931/7378 [3:11:53<22:09:19, 12.37s/it] + 13%|█▎ | 932/7378 [3:12:06<22:17:49, 12.45s/it] + +{'loss': 0.5198, 'learning_rate': 1.9518133550218836e-05, 'epoch': 0.13} + + 13%|█▎ | 932/7378 [3:12:06<22:17:49, 12.45s/it] + 13%|█▎ | 933/7378 [3:12:18<22:16:34, 12.44s/it] + +{'loss': 0.5268, 'learning_rate': 1.9516786270460484e-05, 'epoch': 0.13} + + 13%|█▎ | 933/7378 [3:12:18<22:16:34, 12.44s/it] + 13%|█▎ | 934/7378 [3:12:30<22:17:11, 12.45s/it] + +{'loss': 0.482, 'learning_rate': 1.9515437156490724e-05, 'epoch': 0.13} + + 13%|█▎ | 934/7378 [3:12:31<22:17:11, 12.45s/it] + 13%|█▎ | 935/7378 [3:12:43<22:11:08, 12.40s/it] + +{'loss': 0.4276, 'learning_rate': 1.951408620856957e-05, 'epoch': 0.13} + + 13%|█▎ | 935/7378 [3:12:43<22:11:08, 12.40s/it] + 13%|█▎ | 936/7378 [3:12:55<22:04:40, 12.34s/it] + +{'loss': 0.5622, 'learning_rate': 1.9512733426957403e-05, 'epoch': 0.13} + + 13%|█▎ | 936/7378 [3:12:55<22:04:40, 12.34s/it] + 13%|█▎ | 937/7378 [3:13:07<22:10:57, 12.40s/it] + +{'loss': 0.5463, 'learning_rate': 1.9511378811914952e-05, 'epoch': 0.13} + + 13%|█▎ | 937/7378 [3:13:07<22:10:57, 12.40s/it] + 13%|█▎ | 938/7378 [3:13:20<22:10:32, 12.40s/it] + +{'loss': 0.5945, 'learning_rate': 1.951002236370329e-05, 'epoch': 0.13} + + 13%|█▎ | 938/7378 [3:13:20<22:10:32, 12.40s/it] + 13%|█▎ | 939/7378 [3:13:32<22:18:26, 12.47s/it] + +{'loss': 0.4789, 'learning_rate': 1.950866408258386e-05, 'epoch': 0.13} + + 13%|█▎ | 939/7378 [3:13:32<22:18:26, 12.47s/it] + 13%|█▎ | 940/7378 [3:13:45<22:16:21, 12.45s/it] + +{'loss': 0.5499, 'learning_rate': 1.9507303968818443e-05, 'epoch': 0.13} + + 13%|█▎ | 940/7378 [3:13:45<22:16:21, 12.45s/it] + 13%|█▎ | 941/7378 [3:13:58<22:43:41, 12.71s/it] + +{'loss': 0.4702, 'learning_rate': 1.950594202266918e-05, 'epoch': 0.13} + + 13%|█▎ | 941/7378 [3:13:58<22:43:41, 12.71s/it] + 13%|█▎ | 942/7378 [3:14:10<22:29:51, 12.58s/it] + +{'loss': 0.4794, 'learning_rate': 1.950457824439857e-05, 'epoch': 0.13} + + 13%|█▎ | 942/7378 [3:14:10<22:29:51, 12.58s/it] + 13%|█▎ | 943/7378 [3:14:23<22:19:58, 12.49s/it] + +{'loss': 0.4897, 'learning_rate': 1.9503212634269454e-05, 'epoch': 0.13} + + 13%|█▎ | 943/7378 [3:14:23<22:19:58, 12.49s/it] + 13%|█▎ | 944/7378 [3:14:35<22:14:05, 12.44s/it] + +{'loss': 0.4912, 'learning_rate': 1.9501845192545036e-05, 'epoch': 0.13} + + 13%|█▎ | 944/7378 [3:14:35<22:14:05, 12.44s/it] + 13%|█▎ | 945/7378 [3:14:47<22:03:48, 12.35s/it] + +{'loss': 0.5284, 'learning_rate': 1.9500475919488866e-05, 'epoch': 0.13} + + 13%|█▎ | 945/7378 [3:14:47<22:03:48, 12.35s/it] + 13%|█▎ | 946/7378 [3:14:59<21:56:00, 12.28s/it] + +{'loss': 0.475, 'learning_rate': 1.949910481536485e-05, 'epoch': 0.13} + + 13%|█▎ | 946/7378 [3:14:59<21:56:00, 12.28s/it] + 13%|█▎ | 947/7378 [3:15:11<21:46:58, 12.19s/it] + +{'loss': 0.4989, 'learning_rate': 1.9497731880437246e-05, 'epoch': 0.13} + + 13%|█▎ | 947/7378 [3:15:11<21:46:58, 12.19s/it] + 13%|█▎ | 948/7378 [3:15:24<21:55:39, 12.28s/it] + +{'loss': 0.4723, 'learning_rate': 1.9496357114970673e-05, 'epoch': 0.13} + + 13%|█▎ | 948/7378 [3:15:24<21:55:39, 12.28s/it] + 13%|█▎ | 949/7378 [3:15:36<21:52:13, 12.25s/it] + +{'loss': 0.4537, 'learning_rate': 1.9494980519230086e-05, 'epoch': 0.13} + + 13%|█▎ | 949/7378 [3:15:36<21:52:13, 12.25s/it] + 13%|█▎ | 950/7378 [3:15:48<21:56:42, 12.29s/it] + +{'loss': 0.5426, 'learning_rate': 1.9493602093480807e-05, 'epoch': 0.13} + + 13%|█▎ | 950/7378 [3:15:48<21:56:42, 12.29s/it] + 13%|█▎ | 951/7378 [3:16:01<21:54:38, 12.27s/it] + +{'loss': 0.4936, 'learning_rate': 1.9492221837988506e-05, 'epoch': 0.13} + + 13%|█▎ | 951/7378 [3:16:01<21:54:38, 12.27s/it] + 13%|█▎ | 952/7378 [3:16:13<21:53:03, 12.26s/it] + +{'loss': 0.5338, 'learning_rate': 1.9490839753019205e-05, 'epoch': 0.13} + + 13%|█▎ | 952/7378 [3:16:13<21:53:03, 12.26s/it] + 13%|█▎ | 953/7378 [3:16:26<22:07:21, 12.40s/it] + +{'loss': 0.5102, 'learning_rate': 1.948945583883928e-05, 'epoch': 0.13} + + 13%|█▎ | 953/7378 [3:16:26<22:07:21, 12.40s/it] + 13%|█▎ | 954/7378 [3:16:38<22:02:12, 12.35s/it] + +{'loss': 0.5221, 'learning_rate': 1.9488070095715457e-05, 'epoch': 0.13} + + 13%|█▎ | 954/7378 [3:16:38<22:02:12, 12.35s/it] + 13%|█▎ | 955/7378 [3:16:50<21:50:32, 12.24s/it] + +{'loss': 0.5028, 'learning_rate': 1.9486682523914816e-05, 'epoch': 0.13} + + 13%|█▎ | 955/7378 [3:16:50<21:50:32, 12.24s/it] + 13%|█▎ | 956/7378 [3:17:02<21:44:40, 12.19s/it] + +{'loss': 0.5389, 'learning_rate': 1.948529312370479e-05, 'epoch': 0.13} + + 13%|█▎ | 956/7378 [3:17:02<21:44:40, 12.19s/it] + 13%|█▎ | 957/7378 [3:17:14<21:39:21, 12.14s/it] + +{'loss': 0.489, 'learning_rate': 1.948390189535317e-05, 'epoch': 0.13} + + 13%|█▎ | 957/7378 [3:17:14<21:39:21, 12.14s/it] + 13%|█▎ | 958/7378 [3:17:26<21:49:07, 12.23s/it] + +{'loss': 0.4636, 'learning_rate': 1.9482508839128087e-05, 'epoch': 0.13} + + 13%|█▎ | 958/7378 [3:17:26<21:49:07, 12.23s/it] + 13%|█▎ | 959/7378 [3:17:38<21:42:28, 12.17s/it] + +{'loss': 0.5744, 'learning_rate': 1.948111395529803e-05, 'epoch': 0.13} + + 13%|█▎ | 959/7378 [3:17:38<21:42:28, 12.17s/it] + 13%|█▎ | 960/7378 [3:17:51<21:45:08, 12.20s/it] + +{'loss': 0.5423, 'learning_rate': 1.9479717244131845e-05, 'epoch': 0.13} + + 13%|█▎ | 960/7378 [3:17:51<21:45:08, 12.20s/it] + 13%|█▎ | 961/7378 [3:18:03<21:51:29, 12.26s/it] + +{'loss': 0.4619, 'learning_rate': 1.9478318705898724e-05, 'epoch': 0.13} + + 13%|█▎ | 961/7378 [3:18:03<21:51:29, 12.26s/it] + 13%|█▎ | 962/7378 [3:18:16<22:03:44, 12.38s/it] + +{'loss': 0.4622, 'learning_rate': 1.9476918340868212e-05, 'epoch': 0.13} + + 13%|█▎ | 962/7378 [3:18:16<22:03:44, 12.38s/it] + 13%|█▎ | 963/7378 [3:18:28<21:55:15, 12.30s/it] + +{'loss': 0.4852, 'learning_rate': 1.9475516149310208e-05, 'epoch': 0.13} + + 13%|█▎ | 963/7378 [3:18:28<21:55:15, 12.30s/it] + 13%|█▎ | 964/7378 [3:18:40<21:46:29, 12.22s/it] + +{'loss': 0.4412, 'learning_rate': 1.947411213149497e-05, 'epoch': 0.13} + + 13%|█▎ | 964/7378 [3:18:40<21:46:29, 12.22s/it] + 13%|█▎ | 965/7378 [3:18:52<21:49:31, 12.25s/it] + +{'loss': 0.6175, 'learning_rate': 1.9472706287693088e-05, 'epoch': 0.13} + + 13%|█▎ | 965/7378 [3:18:52<21:49:31, 12.25s/it] + 13%|█▎ | 966/7378 [3:19:05<21:55:17, 12.31s/it] + +{'loss': 0.4818, 'learning_rate': 1.9471298618175523e-05, 'epoch': 0.13} + + 13%|█▎ | 966/7378 [3:19:05<21:55:17, 12.31s/it] + 13%|█▎ | 967/7378 [3:19:17<21:58:29, 12.34s/it] + +{'loss': 0.4899, 'learning_rate': 1.9469889123213585e-05, 'epoch': 0.13} + + 13%|█▎ | 967/7378 [3:19:17<21:58:29, 12.34s/it] + 13%|█▎ | 968/7378 [3:19:29<22:00:08, 12.36s/it] + +{'loss': 0.5925, 'learning_rate': 1.9468477803078926e-05, 'epoch': 0.13} + + 13%|█▎ | 968/7378 [3:19:29<22:00:08, 12.36s/it] + 13%|█▎ | 969/7378 [3:19:42<22:10:07, 12.45s/it] + +{'loss': 0.4885, 'learning_rate': 1.9467064658043556e-05, 'epoch': 0.13} + + 13%|█▎ | 969/7378 [3:19:42<22:10:07, 12.45s/it] + 13%|█▎ | 970/7378 [3:19:55<22:21:19, 12.56s/it] + +{'loss': 0.4753, 'learning_rate': 1.9465649688379837e-05, 'epoch': 0.13} + + 13%|█▎ | 970/7378 [3:19:55<22:21:19, 12.56s/it] + 13%|█▎ | 971/7378 [3:20:07<22:09:25, 12.45s/it] + +{'loss': 0.5736, 'learning_rate': 1.9464232894360483e-05, 'epoch': 0.13} + + 13%|█▎ | 971/7378 [3:20:07<22:09:25, 12.45s/it] + 13%|█▎ | 972/7378 [3:20:19<21:56:53, 12.33s/it] + +{'loss': 0.5311, 'learning_rate': 1.946281427625856e-05, 'epoch': 0.13} + + 13%|█▎ | 972/7378 [3:20:19<21:56:53, 12.33s/it] + 13%|█▎ | 973/7378 [3:20:31<21:54:16, 12.31s/it] + +{'loss': 0.5372, 'learning_rate': 1.9461393834347488e-05, 'epoch': 0.13} + + 13%|█▎ | 973/7378 [3:20:31<21:54:16, 12.31s/it] + 13%|█▎ | 974/7378 [3:20:44<21:48:18, 12.26s/it] + +{'loss': 0.5652, 'learning_rate': 1.9459971568901026e-05, 'epoch': 0.13} + + 13%|█▎ | 974/7378 [3:20:44<21:48:18, 12.26s/it] + 13%|█▎ | 975/7378 [3:20:56<21:56:35, 12.34s/it] + +{'loss': 0.524, 'learning_rate': 1.94585474801933e-05, 'epoch': 0.13} + + 13%|█▎ | 975/7378 [3:20:56<21:56:35, 12.34s/it] + 13%|█▎ | 976/7378 [3:21:09<22:04:30, 12.41s/it] + +{'loss': 0.5318, 'learning_rate': 1.9457121568498778e-05, 'epoch': 0.13} + + 13%|█▎ | 976/7378 [3:21:09<22:04:30, 12.41s/it] + 13%|█▎ | 977/7378 [3:21:21<22:06:24, 12.43s/it] + +{'loss': 0.6062, 'learning_rate': 1.945569383409228e-05, 'epoch': 0.13} + + 13%|█▎ | 977/7378 [3:21:21<22:06:24, 12.43s/it] + 13%|█▎ | 978/7378 [3:21:33<21:43:05, 12.22s/it] + +{'loss': 0.4978, 'learning_rate': 1.9454264277248987e-05, 'epoch': 0.13} + + 13%|█▎ | 978/7378 [3:21:33<21:43:05, 12.22s/it] + 13%|█▎ | 979/7378 [3:21:45<21:42:28, 12.21s/it] + +{'loss': 0.5224, 'learning_rate': 1.945283289824442e-05, 'epoch': 0.13} + + 13%|█▎ | 979/7378 [3:21:45<21:42:28, 12.21s/it] + 13%|█▎ | 980/7378 [3:21:57<21:42:59, 12.22s/it] + +{'loss': 0.5578, 'learning_rate': 1.945139969735445e-05, 'epoch': 0.13} + + 13%|█▎ | 980/7378 [3:21:57<21:42:59, 12.22s/it] + 13%|█▎ | 981/7378 [3:22:10<21:47:24, 12.26s/it] + +{'loss': 0.5222, 'learning_rate': 1.9449964674855306e-05, 'epoch': 0.13} + + 13%|█▎ | 981/7378 [3:22:10<21:47:24, 12.26s/it] + 13%|█▎ | 982/7378 [3:22:22<21:55:00, 12.34s/it] + +{'loss': 0.522, 'learning_rate': 1.944852783102357e-05, 'epoch': 0.13} + + 13%|█▎ | 982/7378 [3:22:22<21:55:00, 12.34s/it] + 13%|█▎ | 983/7378 [3:22:34<21:52:11, 12.31s/it] + +{'loss': 0.5024, 'learning_rate': 1.9447089166136167e-05, 'epoch': 0.13} + + 13%|█▎ | 983/7378 [3:22:34<21:52:11, 12.31s/it] + 13%|█▎ | 984/7378 [3:22:47<21:53:10, 12.32s/it] + +{'loss': 0.416, 'learning_rate': 1.9445648680470383e-05, 'epoch': 0.13} + + 13%|█▎ | 984/7378 [3:22:47<21:53:10, 12.32s/it] + 13%|█▎ | 985/7378 [3:22:59<21:54:24, 12.34s/it] + +{'loss': 0.5436, 'learning_rate': 1.944420637430384e-05, 'epoch': 0.13} + + 13%|█▎ | 985/7378 [3:22:59<21:54:24, 12.34s/it] + 13%|█▎ | 986/7378 [3:23:11<21:40:19, 12.21s/it] + +{'loss': 0.5229, 'learning_rate': 1.944276224791453e-05, 'epoch': 0.13} + + 13%|█▎ | 986/7378 [3:23:11<21:40:19, 12.21s/it] + 13%|█▎ | 987/7378 [3:23:23<21:37:48, 12.18s/it] + +{'loss': 0.5447, 'learning_rate': 1.9441316301580782e-05, 'epoch': 0.13} + + 13%|█▎ | 987/7378 [3:23:23<21:37:48, 12.18s/it] + 13%|█▎ | 988/7378 [3:23:35<21:37:08, 12.18s/it] + +{'loss': 0.5123, 'learning_rate': 1.9439868535581276e-05, 'epoch': 0.13} + + 13%|█▎ | 988/7378 [3:23:35<21:37:08, 12.18s/it] + 13%|█▎ | 989/7378 [3:23:48<21:45:24, 12.26s/it] + +{'loss': 0.5074, 'learning_rate': 1.9438418950195048e-05, 'epoch': 0.13} + + 13%|█▎ | 989/7378 [3:23:48<21:45:24, 12.26s/it] + 13%|█▎ | 990/7378 [3:24:00<21:58:48, 12.39s/it] + +{'loss': 0.5076, 'learning_rate': 1.9436967545701485e-05, 'epoch': 0.13} + + 13%|█▎ | 990/7378 [3:24:00<21:58:48, 12.39s/it] + 13%|█▎ | 991/7378 [3:24:13<21:57:42, 12.38s/it] + +{'loss': 0.5631, 'learning_rate': 1.9435514322380315e-05, 'epoch': 0.13} + + 13%|█▎ | 991/7378 [3:24:13<21:57:42, 12.38s/it] + 13%|█▎ | 992/7378 [3:24:25<21:52:38, 12.33s/it] + +{'loss': 0.4674, 'learning_rate': 1.9434059280511636e-05, 'epoch': 0.13} + + 13%|█▎ | 992/7378 [3:24:25<21:52:38, 12.33s/it] + 13%|█▎ | 993/7378 [3:24:37<21:40:49, 12.22s/it] + +{'loss': 0.5322, 'learning_rate': 1.9432602420375875e-05, 'epoch': 0.13} + + 13%|█▎ | 993/7378 [3:24:37<21:40:49, 12.22s/it] + 13%|█▎ | 994/7378 [3:24:49<21:47:30, 12.29s/it] + +{'loss': 0.4605, 'learning_rate': 1.9431143742253825e-05, 'epoch': 0.13} + + 13%|█▎ | 994/7378 [3:24:49<21:47:30, 12.29s/it] + 13%|█▎ | 995/7378 [3:25:02<21:45:34, 12.27s/it] + +{'loss': 0.5079, 'learning_rate': 1.942968324642662e-05, 'epoch': 0.13} + + 13%|█▎ | 995/7378 [3:25:02<21:45:34, 12.27s/it] + 13%|█▎ | 996/7378 [3:25:14<21:34:01, 12.17s/it] + +{'loss': 0.5198, 'learning_rate': 1.9428220933175747e-05, 'epoch': 0.13} + + 13%|█▎ | 996/7378 [3:25:14<21:34:01, 12.17s/it] + 14%|█▎ | 997/7378 [3:25:26<21:52:46, 12.34s/it] + +{'loss': 0.3891, 'learning_rate': 1.942675680278305e-05, 'epoch': 0.14} + + 14%|█▎ | 997/7378 [3:25:26<21:52:46, 12.34s/it] + 14%|█▎ | 998/7378 [3:25:39<21:59:44, 12.41s/it] + +{'loss': 0.5728, 'learning_rate': 1.9425290855530705e-05, 'epoch': 0.14} + + 14%|█▎ | 998/7378 [3:25:39<21:59:44, 12.41s/it] + 14%|█▎ | 999/7378 [3:25:52<22:35:44, 12.75s/it] + +{'loss': 0.5324, 'learning_rate': 1.9423823091701262e-05, 'epoch': 0.14} + + 14%|█▎ | 999/7378 [3:25:52<22:35:44, 12.75s/it] + 14%|█▎ | 1000/7378 [3:26:04<22:09:20, 12.51s/it] + +{'loss': 0.5082, 'learning_rate': 1.9422353511577604e-05, 'epoch': 0.14} + + 14%|█▎ | 1000/7378 [3:26:04<22:09:20, 12.51s/it] + 14%|█▎ | 1001/7378 [3:26:16<21:53:52, 12.36s/it] + +{'loss': 0.4276, 'learning_rate': 1.9420882115442974e-05, 'epoch': 0.14} + + 14%|█▎ | 1001/7378 [3:26:16<21:53:52, 12.36s/it] + 14%|█▎ | 1002/7378 [3:26:29<21:47:02, 12.30s/it] + +{'loss': 0.4868, 'learning_rate': 1.9419408903580956e-05, 'epoch': 0.14} + + 14%|█▎ | 1002/7378 [3:26:29<21:47:02, 12.30s/it] + 14%|█▎ | 1003/7378 [3:26:41<21:38:14, 12.22s/it] + +{'loss': 0.5091, 'learning_rate': 1.941793387627549e-05, 'epoch': 0.14} + + 14%|█▎ | 1003/7378 [3:26:41<21:38:14, 12.22s/it] + 14%|█▎ | 1004/7378 [3:26:53<21:38:34, 12.22s/it] + +{'loss': 0.4772, 'learning_rate': 1.9416457033810864e-05, 'epoch': 0.14} + + 14%|█▎ | 1004/7378 [3:26:53<21:38:34, 12.22s/it] + 14%|█▎ | 1005/7378 [3:27:06<21:54:41, 12.38s/it] + +{'loss': 0.5359, 'learning_rate': 1.9414978376471714e-05, 'epoch': 0.14} + + 14%|█▎ | 1005/7378 [3:27:06<21:54:41, 12.38s/it] + 14%|█▎ | 1006/7378 [3:27:18<21:48:44, 12.32s/it] + +{'loss': 0.452, 'learning_rate': 1.9413497904543033e-05, 'epoch': 0.14} + + 14%|█▎ | 1006/7378 [3:27:18<21:48:44, 12.32s/it] + 14%|█▎ | 1007/7378 [3:27:31<22:02:36, 12.46s/it] + +{'loss': 0.5024, 'learning_rate': 1.9412015618310156e-05, 'epoch': 0.14} + + 14%|█▎ | 1007/7378 [3:27:31<22:02:36, 12.46s/it] + 14%|█▎ | 1008/7378 [3:27:43<22:05:07, 12.48s/it] + +{'loss': 0.5356, 'learning_rate': 1.9410531518058772e-05, 'epoch': 0.14} + + 14%|█▎ | 1008/7378 [3:27:43<22:05:07, 12.48s/it] + 14%|█▎ | 1009/7378 [3:27:56<22:08:36, 12.52s/it] + +{'loss': 0.5172, 'learning_rate': 1.9409045604074916e-05, 'epoch': 0.14} + + 14%|█▎ | 1009/7378 [3:27:56<22:08:36, 12.52s/it] + 14%|█▎ | 1010/7378 [3:28:09<22:21:02, 12.64s/it] + +{'loss': 0.5194, 'learning_rate': 1.9407557876644974e-05, 'epoch': 0.14} + + 14%|█▎ | 1010/7378 [3:28:09<22:21:02, 12.64s/it] + 14%|█▎ | 1011/7378 [3:28:21<22:16:12, 12.59s/it] + +{'loss': 0.5455, 'learning_rate': 1.9406068336055686e-05, 'epoch': 0.14} + + 14%|█▎ | 1011/7378 [3:28:21<22:16:12, 12.59s/it] + 14%|█▎ | 1012/7378 [3:28:34<22:10:34, 12.54s/it] + +{'loss': 0.4247, 'learning_rate': 1.9404576982594135e-05, 'epoch': 0.14} + + 14%|█▎ | 1012/7378 [3:28:34<22:10:34, 12.54s/it] + 14%|█▎ | 1013/7378 [3:28:46<22:00:29, 12.45s/it] + +{'loss': 0.4885, 'learning_rate': 1.9403083816547758e-05, 'epoch': 0.14} + + 14%|█▎ | 1013/7378 [3:28:46<22:00:29, 12.45s/it] + 14%|█▎ | 1014/7378 [3:28:58<22:01:37, 12.46s/it] + +{'loss': 0.5438, 'learning_rate': 1.9401588838204334e-05, 'epoch': 0.14} + + 14%|█▎ | 1014/7378 [3:28:58<22:01:37, 12.46s/it] + 14%|█▍ | 1015/7378 [3:29:11<22:05:24, 12.50s/it] + +{'loss': 0.4954, 'learning_rate': 1.9400092047852e-05, 'epoch': 0.14} + + 14%|█▍ | 1015/7378 [3:29:11<22:05:24, 12.50s/it] + 14%|█▍ | 1016/7378 [3:29:23<21:57:38, 12.43s/it] + +{'loss': 0.4905, 'learning_rate': 1.9398593445779242e-05, 'epoch': 0.14} + + 14%|█▍ | 1016/7378 [3:29:23<21:57:38, 12.43s/it] + 14%|█▍ | 1017/7378 [3:29:35<21:55:29, 12.41s/it] + +{'loss': 0.4653, 'learning_rate': 1.9397093032274888e-05, 'epoch': 0.14} + + 14%|█▍ | 1017/7378 [3:29:35<21:55:29, 12.41s/it] + 14%|█▍ | 1018/7378 [3:29:48<22:02:11, 12.47s/it] + +{'loss': 0.5399, 'learning_rate': 1.939559080762812e-05, 'epoch': 0.14} + + 14%|█▍ | 1018/7378 [3:29:48<22:02:11, 12.47s/it] + 14%|█▍ | 1019/7378 [3:30:00<21:51:27, 12.37s/it] + +{'loss': 0.4956, 'learning_rate': 1.9394086772128468e-05, 'epoch': 0.14} + + 14%|█▍ | 1019/7378 [3:30:00<21:51:27, 12.37s/it] + 14%|█▍ | 1020/7378 [3:30:13<21:50:09, 12.36s/it] + +{'loss': 0.5046, 'learning_rate': 1.9392580926065814e-05, 'epoch': 0.14} + + 14%|█▍ | 1020/7378 [3:30:13<21:50:09, 12.36s/it] + 14%|█▍ | 1021/7378 [3:30:25<21:49:58, 12.36s/it] + +{'loss': 0.5091, 'learning_rate': 1.9391073269730382e-05, 'epoch': 0.14} + + 14%|█▍ | 1021/7378 [3:30:25<21:49:58, 12.36s/it] + 14%|█▍ | 1022/7378 [3:30:37<21:49:23, 12.36s/it] + +{'loss': 0.5645, 'learning_rate': 1.9389563803412753e-05, 'epoch': 0.14} + + 14%|█▍ | 1022/7378 [3:30:37<21:49:23, 12.36s/it] + 14%|█▍ | 1023/7378 [3:30:49<21:40:25, 12.28s/it] + +{'loss': 0.4963, 'learning_rate': 1.9388052527403852e-05, 'epoch': 0.14} + + 14%|█▍ | 1023/7378 [3:30:49<21:40:25, 12.28s/it] + 14%|█▍ | 1024/7378 [3:31:02<21:39:29, 12.27s/it] + +{'loss': 0.5103, 'learning_rate': 1.9386539441994953e-05, 'epoch': 0.14} + + 14%|█▍ | 1024/7378 [3:31:02<21:39:29, 12.27s/it] + 14%|█▍ | 1025/7378 [3:31:14<21:32:12, 12.20s/it] + +{'loss': 0.4464, 'learning_rate': 1.9385024547477676e-05, 'epoch': 0.14} + + 14%|█▍ | 1025/7378 [3:31:14<21:32:12, 12.20s/it] + 14%|█▍ | 1026/7378 [3:31:26<21:22:58, 12.12s/it] + +{'loss': 0.4805, 'learning_rate': 1.9383507844144e-05, 'epoch': 0.14} + + 14%|█▍ | 1026/7378 [3:31:26<21:22:58, 12.12s/it] + 14%|█▍ | 1027/7378 [3:31:38<21:36:09, 12.25s/it] + +{'loss': 0.5144, 'learning_rate': 1.938198933228624e-05, 'epoch': 0.14} + + 14%|█▍ | 1027/7378 [3:31:38<21:36:09, 12.25s/it] + 14%|█▍ | 1028/7378 [3:31:50<21:38:59, 12.27s/it] + +{'loss': 0.4672, 'learning_rate': 1.9380469012197068e-05, 'epoch': 0.14} + + 14%|█▍ | 1028/7378 [3:31:50<21:38:59, 12.27s/it] + 14%|█▍ | 1029/7378 [3:32:03<21:36:41, 12.25s/it] + +{'loss': 0.4537, 'learning_rate': 1.9378946884169502e-05, 'epoch': 0.14} + + 14%|█▍ | 1029/7378 [3:32:03<21:36:41, 12.25s/it] + 14%|█▍ | 1030/7378 [3:32:15<21:25:31, 12.15s/it] + +{'loss': 0.4613, 'learning_rate': 1.9377422948496912e-05, 'epoch': 0.14} + + 14%|█▍ | 1030/7378 [3:32:15<21:25:31, 12.15s/it] + 14%|█▍ | 1031/7378 [3:32:27<21:32:49, 12.22s/it] + +{'loss': 0.4925, 'learning_rate': 1.9375897205473005e-05, 'epoch': 0.14} + + 14%|█▍ | 1031/7378 [3:32:27<21:32:49, 12.22s/it] + 14%|█▍ | 1032/7378 [3:32:39<21:38:03, 12.27s/it] + +{'loss': 0.5178, 'learning_rate': 1.937436965539185e-05, 'epoch': 0.14} + + 14%|█▍ | 1032/7378 [3:32:39<21:38:03, 12.27s/it] + 14%|█▍ | 1033/7378 [3:32:52<21:33:43, 12.23s/it] + +{'loss': 0.5606, 'learning_rate': 1.9372840298547856e-05, 'epoch': 0.14} + + 14%|█▍ | 1033/7378 [3:32:52<21:33:43, 12.23s/it] + 14%|█▍ | 1034/7378 [3:33:04<21:43:24, 12.33s/it] + +{'loss': 0.5202, 'learning_rate': 1.937130913523578e-05, 'epoch': 0.14} + + 14%|█▍ | 1034/7378 [3:33:04<21:43:24, 12.33s/it] + 14%|█▍ | 1035/7378 [3:33:16<21:42:14, 12.32s/it] + +{'loss': 0.5412, 'learning_rate': 1.9369776165750734e-05, 'epoch': 0.14} + + 14%|█▍ | 1035/7378 [3:33:16<21:42:14, 12.32s/it] + 14%|█▍ | 1036/7378 [3:33:29<21:51:16, 12.41s/it] + +{'loss': 0.5367, 'learning_rate': 1.9368241390388172e-05, 'epoch': 0.14} + + 14%|█▍ | 1036/7378 [3:33:29<21:51:16, 12.41s/it] + 14%|█▍ | 1037/7378 [3:33:41<21:52:22, 12.42s/it] + +{'loss': 0.4477, 'learning_rate': 1.9366704809443898e-05, 'epoch': 0.14} + + 14%|█▍ | 1037/7378 [3:33:41<21:52:22, 12.42s/it] + 14%|█▍ | 1038/7378 [3:33:54<21:55:00, 12.44s/it] + +{'loss': 0.4891, 'learning_rate': 1.9365166423214065e-05, 'epoch': 0.14} + + 14%|█▍ | 1038/7378 [3:33:54<21:55:00, 12.44s/it] + 14%|█▍ | 1039/7378 [3:34:06<21:57:03, 12.47s/it] + +{'loss': 0.5375, 'learning_rate': 1.9363626231995175e-05, 'epoch': 0.14} + + 14%|█▍ | 1039/7378 [3:34:06<21:57:03, 12.47s/it] + 14%|█▍ | 1040/7378 [3:34:19<21:46:00, 12.36s/it] + +{'loss': 0.4911, 'learning_rate': 1.936208423608407e-05, 'epoch': 0.14} + + 14%|█▍ | 1040/7378 [3:34:19<21:46:00, 12.36s/it] + 14%|█▍ | 1041/7378 [3:34:30<21:32:31, 12.24s/it] + +{'loss': 0.5594, 'learning_rate': 1.9360540435777944e-05, 'epoch': 0.14} + + 14%|█▍ | 1041/7378 [3:34:30<21:32:31, 12.24s/it] + 14%|█▍ | 1042/7378 [3:34:43<21:42:36, 12.34s/it] + +{'loss': 0.4855, 'learning_rate': 1.935899483137435e-05, 'epoch': 0.14} + + 14%|█▍ | 1042/7378 [3:34:43<21:42:36, 12.34s/it] + 14%|█▍ | 1043/7378 [3:34:55<21:40:20, 12.32s/it] + +{'loss': 0.4591, 'learning_rate': 1.9357447423171173e-05, 'epoch': 0.14} + + 14%|█▍ | 1043/7378 [3:34:55<21:40:20, 12.32s/it] + 14%|█▍ | 1044/7378 [3:35:08<21:41:43, 12.33s/it] + +{'loss': 0.494, 'learning_rate': 1.9355898211466647e-05, 'epoch': 0.14} + + 14%|█▍ | 1044/7378 [3:35:08<21:41:43, 12.33s/it] + 14%|█▍ | 1045/7378 [3:35:20<21:28:14, 12.20s/it] + +{'loss': 0.5546, 'learning_rate': 1.935434719655937e-05, 'epoch': 0.14} + + 14%|█▍ | 1045/7378 [3:35:20<21:28:14, 12.20s/it] + 14%|█▍ | 1046/7378 [3:35:32<21:34:41, 12.27s/it] + +{'loss': 0.4739, 'learning_rate': 1.9352794378748267e-05, 'epoch': 0.14} + + 14%|█▍ | 1046/7378 [3:35:32<21:34:41, 12.27s/it] + 14%|█▍ | 1047/7378 [3:35:44<21:37:40, 12.30s/it] + +{'loss': 0.4895, 'learning_rate': 1.935123975833262e-05, 'epoch': 0.14} + + 14%|█▍ | 1047/7378 [3:35:44<21:37:40, 12.30s/it] + 14%|█▍ | 1048/7378 [3:35:57<21:37:16, 12.30s/it] + +{'loss': 0.4879, 'learning_rate': 1.9349683335612064e-05, 'epoch': 0.14} + + 14%|█▍ | 1048/7378 [3:35:57<21:37:16, 12.30s/it] + 14%|█▍ | 1049/7378 [3:36:09<21:38:37, 12.31s/it] + +{'loss': 0.4601, 'learning_rate': 1.9348125110886564e-05, 'epoch': 0.14} + + 14%|█▍ | 1049/7378 [3:36:09<21:38:37, 12.31s/it] + 14%|█▍ | 1050/7378 [3:36:21<21:32:36, 12.26s/it] + +{'loss': 0.5488, 'learning_rate': 1.9346565084456455e-05, 'epoch': 0.14} + + 14%|█▍ | 1050/7378 [3:36:21<21:32:36, 12.26s/it] + 14%|█▍ | 1051/7378 [3:36:33<21:29:37, 12.23s/it] + +{'loss': 0.4997, 'learning_rate': 1.93450032566224e-05, 'epoch': 0.14} + + 14%|█▍ | 1051/7378 [3:36:33<21:29:37, 12.23s/it] + 14%|█▍ | 1052/7378 [3:36:46<21:38:19, 12.31s/it] + +{'loss': 0.4929, 'learning_rate': 1.9343439627685422e-05, 'epoch': 0.14} + + 14%|█▍ | 1052/7378 [3:36:46<21:38:19, 12.31s/it] + 14%|█▍ | 1053/7378 [3:36:58<21:35:43, 12.29s/it] + +{'loss': 0.5239, 'learning_rate': 1.934187419794688e-05, 'epoch': 0.14} + + 14%|█▍ | 1053/7378 [3:36:58<21:35:43, 12.29s/it] + 14%|█▍ | 1054/7378 [3:37:11<21:42:06, 12.35s/it] + +{'loss': 0.5378, 'learning_rate': 1.934030696770849e-05, 'epoch': 0.14} + + 14%|█▍ | 1054/7378 [3:37:11<21:42:06, 12.35s/it] + 14%|█▍ | 1055/7378 [3:37:23<21:45:45, 12.39s/it] + +{'loss': 0.5266, 'learning_rate': 1.933873793727231e-05, 'epoch': 0.14} + + 14%|█▍ | 1055/7378 [3:37:23<21:45:45, 12.39s/it] + 14%|█▍ | 1056/7378 [3:37:35<21:45:47, 12.39s/it] + +{'loss': 0.4376, 'learning_rate': 1.9337167106940747e-05, 'epoch': 0.14} + + 14%|█▍ | 1056/7378 [3:37:35<21:45:47, 12.39s/it] + 14%|█▍ | 1057/7378 [3:37:48<21:43:17, 12.37s/it] + +{'loss': 0.4672, 'learning_rate': 1.9335594477016557e-05, 'epoch': 0.14} + + 14%|█▍ | 1057/7378 [3:37:48<21:43:17, 12.37s/it] + 14%|█▍ | 1058/7378 [3:38:00<21:31:52, 12.26s/it] + +{'loss': 0.5668, 'learning_rate': 1.9334020047802833e-05, 'epoch': 0.14} + + 14%|█▍ | 1058/7378 [3:38:00<21:31:52, 12.26s/it] + 14%|█▍ | 1059/7378 [3:38:12<21:38:11, 12.33s/it] + +{'loss': 0.5386, 'learning_rate': 1.9332443819603024e-05, 'epoch': 0.14} + + 14%|█▍ | 1059/7378 [3:38:12<21:38:11, 12.33s/it] + 14%|█▍ | 1060/7378 [3:38:24<21:25:24, 12.21s/it] + +{'loss': 0.5137, 'learning_rate': 1.9330865792720926e-05, 'epoch': 0.14} + + 14%|█▍ | 1060/7378 [3:38:24<21:25:24, 12.21s/it] + 14%|█▍ | 1061/7378 [3:38:37<21:43:47, 12.38s/it] + +{'loss': 0.5311, 'learning_rate': 1.9329285967460673e-05, 'epoch': 0.14} + + 14%|█▍ | 1061/7378 [3:38:37<21:43:47, 12.38s/it] + 14%|█▍ | 1062/7378 [3:38:49<21:41:44, 12.37s/it] + +{'loss': 0.5144, 'learning_rate': 1.932770434412676e-05, 'epoch': 0.14} + + 14%|█▍ | 1062/7378 [3:38:49<21:41:44, 12.37s/it] + 14%|█▍ | 1063/7378 [3:39:02<21:40:48, 12.36s/it] + +{'loss': 0.5417, 'learning_rate': 1.9326120923024013e-05, 'epoch': 0.14} + + 14%|█▍ | 1063/7378 [3:39:02<21:40:48, 12.36s/it] + 14%|█▍ | 1064/7378 [3:39:14<21:34:59, 12.31s/it] + +{'loss': 0.5368, 'learning_rate': 1.9324535704457617e-05, 'epoch': 0.14} + + 14%|█▍ | 1064/7378 [3:39:14<21:34:59, 12.31s/it] + 14%|█▍ | 1065/7378 [3:39:26<21:34:47, 12.31s/it] + +{'loss': 0.4831, 'learning_rate': 1.9322948688733093e-05, 'epoch': 0.14} + + 14%|█▍ | 1065/7378 [3:39:26<21:34:47, 12.31s/it] + 14%|█▍ | 1066/7378 [3:39:38<21:26:32, 12.23s/it] + +{'loss': 0.4582, 'learning_rate': 1.9321359876156314e-05, 'epoch': 0.14} + + 14%|█▍ | 1066/7378 [3:39:38<21:26:32, 12.23s/it] + 14%|█▍ | 1067/7378 [3:39:51<21:30:21, 12.27s/it] + +{'loss': 0.4621, 'learning_rate': 1.9319769267033502e-05, 'epoch': 0.14} + + 14%|█▍ | 1067/7378 [3:39:51<21:30:21, 12.27s/it] + 14%|█▍ | 1068/7378 [3:40:03<21:32:42, 12.29s/it] + +{'loss': 0.4946, 'learning_rate': 1.931817686167122e-05, 'epoch': 0.14} + + 14%|█▍ | 1068/7378 [3:40:03<21:32:42, 12.29s/it] + 14%|█▍ | 1069/7378 [3:40:15<21:30:33, 12.27s/it] + +{'loss': 0.5015, 'learning_rate': 1.9316582660376384e-05, 'epoch': 0.14} + + 14%|█▍ | 1069/7378 [3:40:15<21:30:33, 12.27s/it] + 15%|█▍ | 1070/7378 [3:40:27<21:30:47, 12.28s/it] + +{'loss': 0.5329, 'learning_rate': 1.931498666345624e-05, 'epoch': 0.15} + + 15%|█▍ | 1070/7378 [3:40:27<21:30:47, 12.28s/it] + 15%|█▍ | 1071/7378 [3:40:40<21:36:19, 12.33s/it] + +{'loss': 0.5116, 'learning_rate': 1.9313388871218405e-05, 'epoch': 0.15} + + 15%|█▍ | 1071/7378 [3:40:40<21:36:19, 12.33s/it] + 15%|█▍ | 1072/7378 [3:40:52<21:30:40, 12.28s/it] + +{'loss': 0.4691, 'learning_rate': 1.9311789283970818e-05, 'epoch': 0.15} + + 15%|█▍ | 1072/7378 [3:40:52<21:30:40, 12.28s/it] + 15%|█▍ | 1073/7378 [3:41:05<21:43:13, 12.40s/it] + +{'loss': 0.4897, 'learning_rate': 1.9310187902021775e-05, 'epoch': 0.15} + + 15%|█▍ | 1073/7378 [3:41:05<21:43:13, 12.40s/it] + 15%|█▍ | 1074/7378 [3:41:17<21:50:54, 12.48s/it] + +{'loss': 0.5021, 'learning_rate': 1.9308584725679926e-05, 'epoch': 0.15} + + 15%|█▍ | 1074/7378 [3:41:17<21:50:54, 12.48s/it] + 15%|█▍ | 1075/7378 [3:41:30<21:42:24, 12.40s/it] + +{'loss': 0.5143, 'learning_rate': 1.930697975525425e-05, 'epoch': 0.15} + + 15%|█▍ | 1075/7378 [3:41:30<21:42:24, 12.40s/it] + 15%|█▍ | 1076/7378 [3:41:42<21:48:07, 12.45s/it] + +{'loss': 0.4682, 'learning_rate': 1.9305372991054078e-05, 'epoch': 0.15} + + 15%|█▍ | 1076/7378 [3:41:42<21:48:07, 12.45s/it] + 15%|█▍ | 1077/7378 [3:41:55<22:02:19, 12.59s/it] + +{'loss': 0.5093, 'learning_rate': 1.93037644333891e-05, 'epoch': 0.15} + + 15%|█▍ | 1077/7378 [3:41:55<22:02:19, 12.59s/it] + 15%|█▍ | 1078/7378 [3:42:08<21:57:01, 12.54s/it] + +{'loss': 0.532, 'learning_rate': 1.9302154082569328e-05, 'epoch': 0.15} + + 15%|█▍ | 1078/7378 [3:42:08<21:57:01, 12.54s/it] + 15%|█▍ | 1079/7378 [3:42:20<21:56:40, 12.54s/it] + +{'loss': 0.4158, 'learning_rate': 1.930054193890514e-05, 'epoch': 0.15} + + 15%|█▍ | 1079/7378 [3:42:20<21:56:40, 12.54s/it] + 15%|█▍ | 1080/7378 [3:42:32<21:28:39, 12.28s/it] + +{'loss': 0.4869, 'learning_rate': 1.929892800270725e-05, 'epoch': 0.15} + + 15%|█▍ | 1080/7378 [3:42:32<21:28:39, 12.28s/it] + 15%|█▍ | 1081/7378 [3:42:44<21:23:28, 12.23s/it] + +{'loss': 0.4603, 'learning_rate': 1.9297312274286716e-05, 'epoch': 0.15} + + 15%|█▍ | 1081/7378 [3:42:44<21:23:28, 12.23s/it] + 15%|█▍ | 1082/7378 [3:42:56<21:22:28, 12.22s/it] + +{'loss': 0.4405, 'learning_rate': 1.9295694753954942e-05, 'epoch': 0.15} + + 15%|█▍ | 1082/7378 [3:42:56<21:22:28, 12.22s/it] + 15%|█▍ | 1083/7378 [3:43:09<21:42:10, 12.41s/it] + +{'loss': 0.4839, 'learning_rate': 1.9294075442023687e-05, 'epoch': 0.15} + + 15%|█▍ | 1083/7378 [3:43:09<21:42:10, 12.41s/it] + 15%|█▍ | 1084/7378 [3:43:21<21:22:15, 12.22s/it] + +{'loss': 0.5094, 'learning_rate': 1.9292454338805044e-05, 'epoch': 0.15} + + 15%|█▍ | 1084/7378 [3:43:21<21:22:15, 12.22s/it] + 15%|█▍ | 1085/7378 [3:43:33<21:30:44, 12.31s/it] + +{'loss': 0.4637, 'learning_rate': 1.9290831444611456e-05, 'epoch': 0.15} + + 15%|█▍ | 1085/7378 [3:43:33<21:30:44, 12.31s/it] + 15%|█▍ | 1086/7378 [3:43:46<21:42:10, 12.42s/it] + +{'loss': 0.5441, 'learning_rate': 1.928920675975571e-05, 'epoch': 0.15} + + 15%|█▍ | 1086/7378 [3:43:46<21:42:10, 12.42s/it] + 15%|█▍ | 1087/7378 [3:43:58<21:42:56, 12.43s/it] + +{'loss': 0.5106, 'learning_rate': 1.9287580284550937e-05, 'epoch': 0.15} + + 15%|█▍ | 1087/7378 [3:43:58<21:42:56, 12.43s/it] + 15%|█▍ | 1088/7378 [3:44:11<21:36:26, 12.37s/it] + +{'loss': 0.5043, 'learning_rate': 1.928595201931062e-05, 'epoch': 0.15} + + 15%|█▍ | 1088/7378 [3:44:11<21:36:26, 12.37s/it] + 15%|█▍ | 1089/7378 [3:44:23<21:54:30, 12.54s/it] + +{'loss': 0.4739, 'learning_rate': 1.9284321964348574e-05, 'epoch': 0.15} + + 15%|█▍ | 1089/7378 [3:44:23<21:54:30, 12.54s/it] + 15%|█▍ | 1090/7378 [3:44:36<21:54:39, 12.54s/it] + +{'loss': 0.5538, 'learning_rate': 1.928269011997897e-05, 'epoch': 0.15} + + 15%|█▍ | 1090/7378 [3:44:36<21:54:39, 12.54s/it] + 15%|█▍ | 1091/7378 [3:44:48<21:51:45, 12.52s/it] + +{'loss': 0.5177, 'learning_rate': 1.9281056486516325e-05, 'epoch': 0.15} + + 15%|█▍ | 1091/7378 [3:44:48<21:51:45, 12.52s/it] + 15%|█▍ | 1092/7378 [3:45:01<21:59:34, 12.60s/it] + +{'loss': 0.5618, 'learning_rate': 1.927942106427549e-05, 'epoch': 0.15} + + 15%|█▍ | 1092/7378 [3:45:01<21:59:34, 12.60s/it] + 15%|█▍ | 1093/7378 [3:45:14<22:04:32, 12.64s/it] + +{'loss': 0.5153, 'learning_rate': 1.9277783853571673e-05, 'epoch': 0.15} + + 15%|█▍ | 1093/7378 [3:45:14<22:04:32, 12.64s/it] + 15%|█▍ | 1094/7378 [3:45:26<21:44:48, 12.46s/it] + +{'loss': 0.4233, 'learning_rate': 1.9276144854720412e-05, 'epoch': 0.15} + + 15%|█▍ | 1094/7378 [3:45:26<21:44:48, 12.46s/it] + 15%|█▍ | 1095/7378 [3:45:39<21:46:34, 12.48s/it] + +{'loss': 0.459, 'learning_rate': 1.9274504068037604e-05, 'epoch': 0.15} + + 15%|█▍ | 1095/7378 [3:45:39<21:46:34, 12.48s/it] + 15%|█▍ | 1096/7378 [3:45:51<21:53:40, 12.55s/it] + +{'loss': 0.4953, 'learning_rate': 1.9272861493839483e-05, 'epoch': 0.15} + + 15%|█▍ | 1096/7378 [3:45:51<21:53:40, 12.55s/it] + 15%|█▍ | 1097/7378 [3:46:04<21:45:35, 12.47s/it] + +{'loss': 0.461, 'learning_rate': 1.9271217132442633e-05, 'epoch': 0.15} + + 15%|█▍ | 1097/7378 [3:46:04<21:45:35, 12.47s/it] + 15%|█▍ | 1098/7378 [3:46:16<21:39:31, 12.42s/it] + +{'loss': 0.5163, 'learning_rate': 1.9269570984163974e-05, 'epoch': 0.15} + + 15%|█▍ | 1098/7378 [3:46:16<21:39:31, 12.42s/it] + 15%|█▍ | 1099/7378 [3:46:28<21:38:28, 12.41s/it] + +{'loss': 0.5297, 'learning_rate': 1.926792304932078e-05, 'epoch': 0.15} + + 15%|█▍ | 1099/7378 [3:46:28<21:38:28, 12.41s/it] + 15%|█▍ | 1100/7378 [3:46:40<21:29:00, 12.32s/it] + +{'loss': 0.5153, 'learning_rate': 1.926627332823066e-05, 'epoch': 0.15} + + 15%|█▍ | 1100/7378 [3:46:40<21:29:00, 12.32s/it] + 15%|█▍ | 1101/7378 [3:46:53<21:31:07, 12.34s/it] + +{'loss': 0.4888, 'learning_rate': 1.9264621821211577e-05, 'epoch': 0.15} + + 15%|█▍ | 1101/7378 [3:46:53<21:31:07, 12.34s/it] + 15%|█▍ | 1102/7378 [3:47:06<21:47:01, 12.50s/it] + +{'loss': 0.5234, 'learning_rate': 1.9262968528581828e-05, 'epoch': 0.15} + + 15%|█▍ | 1102/7378 [3:47:06<21:47:01, 12.50s/it] + 15%|█▍ | 1103/7378 [3:47:18<21:43:47, 12.47s/it] + +{'loss': 0.4971, 'learning_rate': 1.926131345066006e-05, 'epoch': 0.15} + + 15%|█▍ | 1103/7378 [3:47:18<21:43:47, 12.47s/it] + 15%|█▍ | 1104/7378 [3:47:30<21:40:12, 12.43s/it] + +{'loss': 0.5368, 'learning_rate': 1.925965658776527e-05, 'epoch': 0.15} + + 15%|█▍ | 1104/7378 [3:47:30<21:40:12, 12.43s/it] + 15%|█▍ | 1105/7378 [3:47:43<21:32:28, 12.36s/it] + +{'loss': 0.4606, 'learning_rate': 1.9257997940216783e-05, 'epoch': 0.15} + + 15%|█▍ | 1105/7378 [3:47:43<21:32:28, 12.36s/it] + 15%|█▍ | 1106/7378 [3:47:55<21:31:10, 12.35s/it] + +{'loss': 0.5185, 'learning_rate': 1.9256337508334286e-05, 'epoch': 0.15} + + 15%|█▍ | 1106/7378 [3:47:55<21:31:10, 12.35s/it] + 15%|█▌ | 1107/7378 [3:48:07<21:38:26, 12.42s/it] + +{'loss': 0.5065, 'learning_rate': 1.925467529243779e-05, 'epoch': 0.15} + + 15%|█▌ | 1107/7378 [3:48:07<21:38:26, 12.42s/it] + 15%|█▌ | 1108/7378 [3:48:20<21:33:27, 12.38s/it] + +{'loss': 0.5368, 'learning_rate': 1.9253011292847672e-05, 'epoch': 0.15} + + 15%|█▌ | 1108/7378 [3:48:20<21:33:27, 12.38s/it] + 15%|█▌ | 1109/7378 [3:48:32<21:41:47, 12.46s/it] + +{'loss': 0.4713, 'learning_rate': 1.9251345509884638e-05, 'epoch': 0.15} + + 15%|█▌ | 1109/7378 [3:48:32<21:41:47, 12.46s/it] + 15%|█▌ | 1110/7378 [3:48:45<21:38:56, 12.43s/it] + +{'loss': 0.4713, 'learning_rate': 1.9249677943869742e-05, 'epoch': 0.15} + + 15%|█▌ | 1110/7378 [3:48:45<21:38:56, 12.43s/it] + 15%|█▌ | 1111/7378 [3:48:57<21:36:39, 12.41s/it] + +{'loss': 0.4314, 'learning_rate': 1.9248008595124378e-05, 'epoch': 0.15} + + 15%|█▌ | 1111/7378 [3:48:57<21:36:39, 12.41s/it] + 15%|█▌ | 1112/7378 [3:49:10<21:40:52, 12.46s/it] + +{'loss': 0.5309, 'learning_rate': 1.924633746397029e-05, 'epoch': 0.15} + + 15%|█▌ | 1112/7378 [3:49:10<21:40:52, 12.46s/it] + 15%|█▌ | 1113/7378 [3:49:22<21:39:11, 12.44s/it] + +{'loss': 0.5475, 'learning_rate': 1.924466455072956e-05, 'epoch': 0.15} + + 15%|█▌ | 1113/7378 [3:49:22<21:39:11, 12.44s/it] + 15%|█▌ | 1114/7378 [3:49:35<21:42:05, 12.47s/it] + +{'loss': 0.471, 'learning_rate': 1.924298985572462e-05, 'epoch': 0.15} + + 15%|█▌ | 1114/7378 [3:49:35<21:42:05, 12.47s/it] + 15%|█▌ | 1115/7378 [3:49:47<21:48:39, 12.54s/it] + +{'loss': 0.5151, 'learning_rate': 1.9241313379278237e-05, 'epoch': 0.15} + + 15%|█▌ | 1115/7378 [3:49:47<21:48:39, 12.54s/it] + 15%|█▌ | 1116/7378 [3:50:00<21:52:55, 12.58s/it] + +{'loss': 0.5126, 'learning_rate': 1.923963512171353e-05, 'epoch': 0.15} + + 15%|█▌ | 1116/7378 [3:50:00<21:52:55, 12.58s/it] + 15%|█▌ | 1117/7378 [3:50:12<21:40:24, 12.46s/it] + +{'loss': 0.4361, 'learning_rate': 1.923795508335395e-05, 'epoch': 0.15} + + 15%|█▌ | 1117/7378 [3:50:12<21:40:24, 12.46s/it] + 15%|█▌ | 1118/7378 [3:50:25<21:40:34, 12.47s/it] + +{'loss': 0.5033, 'learning_rate': 1.9236273264523304e-05, 'epoch': 0.15} + + 15%|█▌ | 1118/7378 [3:50:25<21:40:34, 12.47s/it] + 15%|█▌ | 1119/7378 [3:50:37<21:30:35, 12.37s/it] + +{'loss': 0.5102, 'learning_rate': 1.9234589665545734e-05, 'epoch': 0.15} + + 15%|█▌ | 1119/7378 [3:50:37<21:30:35, 12.37s/it] + 15%|█▌ | 1120/7378 [3:50:49<21:36:48, 12.43s/it] + +{'loss': 0.5035, 'learning_rate': 1.923290428674573e-05, 'epoch': 0.15} + + 15%|█▌ | 1120/7378 [3:50:49<21:36:48, 12.43s/it] + 15%|█▌ | 1121/7378 [3:51:02<21:37:38, 12.44s/it] + +{'loss': 0.4799, 'learning_rate': 1.9231217128448118e-05, 'epoch': 0.15} + + 15%|█▌ | 1121/7378 [3:51:02<21:37:38, 12.44s/it] + 15%|█▌ | 1122/7378 [3:51:14<21:31:50, 12.39s/it] + +{'loss': 0.5039, 'learning_rate': 1.9229528190978072e-05, 'epoch': 0.15} + + 15%|█▌ | 1122/7378 [3:51:14<21:31:50, 12.39s/it] + 15%|█▌ | 1123/7378 [3:51:26<21:30:05, 12.37s/it] + +{'loss': 0.5506, 'learning_rate': 1.9227837474661113e-05, 'epoch': 0.15} + + 15%|█▌ | 1123/7378 [3:51:26<21:30:05, 12.37s/it] + 15%|█▌ | 1124/7378 [3:51:39<21:28:43, 12.36s/it] + +{'loss': 0.4998, 'learning_rate': 1.9226144979823094e-05, 'epoch': 0.15} + + 15%|█▌ | 1124/7378 [3:51:39<21:28:43, 12.36s/it] + 15%|█▌ | 1125/7378 [3:51:52<21:47:28, 12.55s/it] + +{'loss': 0.491, 'learning_rate': 1.9224450706790222e-05, 'epoch': 0.15} + + 15%|█▌ | 1125/7378 [3:51:52<21:47:28, 12.55s/it] + 15%|█▌ | 1126/7378 [3:52:04<21:43:13, 12.51s/it] + +{'loss': 0.4795, 'learning_rate': 1.9222754655889035e-05, 'epoch': 0.15} + + 15%|█▌ | 1126/7378 [3:52:04<21:43:13, 12.51s/it] + 15%|█▌ | 1127/7378 [3:52:16<21:30:01, 12.38s/it] + +{'loss': 0.4274, 'learning_rate': 1.9221056827446426e-05, 'epoch': 0.15} + + 15%|█▌ | 1127/7378 [3:52:16<21:30:01, 12.38s/it] + 15%|█▌ | 1128/7378 [3:52:28<21:22:03, 12.31s/it] + +{'loss': 0.4673, 'learning_rate': 1.9219357221789624e-05, 'epoch': 0.15} + + 15%|█▌ | 1128/7378 [3:52:28<21:22:03, 12.31s/it] + 15%|█▌ | 1129/7378 [3:52:41<21:20:29, 12.29s/it] + +{'loss': 0.4973, 'learning_rate': 1.9217655839246198e-05, 'epoch': 0.15} + + 15%|█▌ | 1129/7378 [3:52:41<21:20:29, 12.29s/it] + 15%|█▌ | 1130/7378 [3:52:53<21:20:47, 12.30s/it] + +{'loss': 0.4497, 'learning_rate': 1.9215952680144067e-05, 'epoch': 0.15} + + 15%|█▌ | 1130/7378 [3:52:53<21:20:47, 12.30s/it] + 15%|█▌ | 1131/7378 [3:53:06<21:31:09, 12.40s/it] + +{'loss': 0.5, 'learning_rate': 1.9214247744811488e-05, 'epoch': 0.15} + + 15%|█▌ | 1131/7378 [3:53:06<21:31:09, 12.40s/it] + 15%|█▌ | 1132/7378 [3:53:18<21:38:10, 12.47s/it] + +{'loss': 0.5368, 'learning_rate': 1.921254103357706e-05, 'epoch': 0.15} + + 15%|█▌ | 1132/7378 [3:53:18<21:38:10, 12.47s/it] + 15%|█▌ | 1133/7378 [3:53:30<21:21:05, 12.31s/it] + +{'loss': 0.4839, 'learning_rate': 1.921083254676972e-05, 'epoch': 0.15} + + 15%|█▌ | 1133/7378 [3:53:30<21:21:05, 12.31s/it] + 15%|█▌ | 1134/7378 [3:53:43<21:26:26, 12.36s/it] + +{'loss': 0.5396, 'learning_rate': 1.9209122284718757e-05, 'epoch': 0.15} + + 15%|█▌ | 1134/7378 [3:53:43<21:26:26, 12.36s/it] + 15%|█▌ | 1135/7378 [3:53:55<21:13:39, 12.24s/it] + +{'loss': 0.475, 'learning_rate': 1.9207410247753795e-05, 'epoch': 0.15} + + 15%|█▌ | 1135/7378 [3:53:55<21:13:39, 12.24s/it] + 15%|█▌ | 1136/7378 [3:54:07<21:07:59, 12.19s/it] + +{'loss': 0.5118, 'learning_rate': 1.9205696436204807e-05, 'epoch': 0.15} + + 15%|█▌ | 1136/7378 [3:54:07<21:07:59, 12.19s/it] + 15%|█▌ | 1137/7378 [3:54:19<21:11:41, 12.23s/it] + +{'loss': 0.45, 'learning_rate': 1.92039808504021e-05, 'epoch': 0.15} + + 15%|█▌ | 1137/7378 [3:54:19<21:11:41, 12.23s/it] + 15%|█▌ | 1138/7378 [3:54:32<21:22:19, 12.33s/it] + +{'loss': 0.4859, 'learning_rate': 1.9202263490676323e-05, 'epoch': 0.15} + + 15%|█▌ | 1138/7378 [3:54:32<21:22:19, 12.33s/it] + 15%|█▌ | 1139/7378 [3:54:44<21:18:58, 12.30s/it] + +{'loss': 0.4928, 'learning_rate': 1.920054435735847e-05, 'epoch': 0.15} + + 15%|█▌ | 1139/7378 [3:54:44<21:18:58, 12.30s/it] + 15%|█▌ | 1140/7378 [3:54:56<21:19:45, 12.31s/it] + +{'loss': 0.5045, 'learning_rate': 1.919882345077989e-05, 'epoch': 0.15} + + 15%|█▌ | 1140/7378 [3:54:56<21:19:45, 12.31s/it] + 15%|█▌ | 1141/7378 [3:55:10<21:56:29, 12.66s/it] + +{'loss': 0.5549, 'learning_rate': 1.9197100771272243e-05, 'epoch': 0.15} + + 15%|█▌ | 1141/7378 [3:55:10<21:56:29, 12.66s/it] + 15%|█▌ | 1142/7378 [3:55:22<21:42:54, 12.54s/it] + +{'loss': 0.4735, 'learning_rate': 1.919537631916756e-05, 'epoch': 0.15} + + 15%|█▌ | 1142/7378 [3:55:22<21:42:54, 12.54s/it] + 15%|█▌ | 1143/7378 [3:55:34<21:36:01, 12.47s/it] + +{'loss': 0.4796, 'learning_rate': 1.9193650094798198e-05, 'epoch': 0.15} + + 15%|█▌ | 1143/7378 [3:55:34<21:36:01, 12.47s/it] + 16%|█▌ | 1144/7378 [3:55:47<21:33:15, 12.45s/it] + +{'loss': 0.5251, 'learning_rate': 1.919192209849686e-05, 'epoch': 0.16} + + 16%|█▌ | 1144/7378 [3:55:47<21:33:15, 12.45s/it] + 16%|█▌ | 1145/7378 [3:55:59<21:25:49, 12.38s/it] + +{'loss': 0.4447, 'learning_rate': 1.919019233059659e-05, 'epoch': 0.16} + + 16%|█▌ | 1145/7378 [3:55:59<21:25:49, 12.38s/it] + 16%|█▌ | 1146/7378 [3:56:12<21:43:07, 12.55s/it] + +{'loss': 0.5038, 'learning_rate': 1.9188460791430775e-05, 'epoch': 0.16} + + 16%|█▌ | 1146/7378 [3:56:12<21:43:07, 12.55s/it] + 16%|█▌ | 1147/7378 [3:56:24<21:32:45, 12.45s/it] + +{'loss': 0.5011, 'learning_rate': 1.918672748133314e-05, 'epoch': 0.16} + + 16%|█▌ | 1147/7378 [3:56:24<21:32:45, 12.45s/it] + 16%|█▌ | 1148/7378 [3:56:37<21:42:39, 12.55s/it] + +{'loss': 0.5007, 'learning_rate': 1.9184992400637753e-05, 'epoch': 0.16} + + 16%|█▌ | 1148/7378 [3:56:37<21:42:39, 12.55s/it] + 16%|█▌ | 1149/7378 [3:56:49<21:38:06, 12.50s/it] + +{'loss': 0.5646, 'learning_rate': 1.9183255549679033e-05, 'epoch': 0.16} + + 16%|█▌ | 1149/7378 [3:56:49<21:38:06, 12.50s/it] + 16%|█▌ | 1150/7378 [3:57:01<21:26:17, 12.39s/it] + +{'loss': 0.5123, 'learning_rate': 1.9181516928791715e-05, 'epoch': 0.16} + + 16%|█▌ | 1150/7378 [3:57:01<21:26:17, 12.39s/it] + 16%|█▌ | 1151/7378 [3:57:13<21:18:01, 12.31s/it] + +{'loss': 0.5239, 'learning_rate': 1.9179776538310902e-05, 'epoch': 0.16} + + 16%|█▌ | 1151/7378 [3:57:13<21:18:01, 12.31s/it] + 16%|█▌ | 1152/7378 [3:57:26<21:21:16, 12.35s/it] + +{'loss': 0.4438, 'learning_rate': 1.9178034378572023e-05, 'epoch': 0.16} + + 16%|█▌ | 1152/7378 [3:57:26<21:21:16, 12.35s/it] + 16%|█▌ | 1153/7378 [3:57:38<21:19:11, 12.33s/it] + +{'loss': 0.5255, 'learning_rate': 1.9176290449910854e-05, 'epoch': 0.16} + + 16%|█▌ | 1153/7378 [3:57:38<21:19:11, 12.33s/it] + 16%|█▌ | 1154/7378 [3:57:50<21:18:22, 12.32s/it] + +{'loss': 0.5298, 'learning_rate': 1.9174544752663507e-05, 'epoch': 0.16} + + 16%|█▌ | 1154/7378 [3:57:50<21:18:22, 12.32s/it] + 16%|█▌ | 1155/7378 [3:58:02<21:06:48, 12.21s/it] + +{'loss': 0.5196, 'learning_rate': 1.917279728716644e-05, 'epoch': 0.16} + + 16%|█▌ | 1155/7378 [3:58:02<21:06:48, 12.21s/it] + 16%|█▌ | 1156/7378 [3:58:15<21:14:16, 12.29s/it] + +{'loss': 0.4476, 'learning_rate': 1.9171048053756453e-05, 'epoch': 0.16} + + 16%|█▌ | 1156/7378 [3:58:15<21:14:16, 12.29s/it] + 16%|█▌ | 1157/7378 [3:58:28<21:25:48, 12.40s/it] + +{'loss': 0.5423, 'learning_rate': 1.9169297052770676e-05, 'epoch': 0.16} + + 16%|█▌ | 1157/7378 [3:58:28<21:25:48, 12.40s/it] + 16%|█▌ | 1158/7378 [3:58:39<21:08:18, 12.23s/it] + +{'loss': 0.4575, 'learning_rate': 1.916754428454659e-05, 'epoch': 0.16} + + 16%|█▌ | 1158/7378 [3:58:39<21:08:18, 12.23s/it] + 16%|█▌ | 1159/7378 [3:58:52<21:05:59, 12.21s/it] + +{'loss': 0.5381, 'learning_rate': 1.9165789749422014e-05, 'epoch': 0.16} + + 16%|█▌ | 1159/7378 [3:58:52<21:05:59, 12.21s/it] + 16%|█▌ | 1160/7378 [3:59:04<21:21:56, 12.37s/it] + +{'loss': 0.4736, 'learning_rate': 1.916403344773511e-05, 'epoch': 0.16} + + 16%|█▌ | 1160/7378 [3:59:04<21:21:56, 12.37s/it] + 16%|█▌ | 1161/7378 [3:59:16<21:10:29, 12.26s/it] + +{'loss': 0.5024, 'learning_rate': 1.9162275379824372e-05, 'epoch': 0.16} + + 16%|█▌ | 1161/7378 [3:59:16<21:10:29, 12.26s/it] + 16%|█▌ | 1162/7378 [3:59:28<21:07:29, 12.23s/it] + +{'loss': 0.4815, 'learning_rate': 1.9160515546028644e-05, 'epoch': 0.16} + + 16%|█▌ | 1162/7378 [3:59:28<21:07:29, 12.23s/it] + 16%|█▌ | 1163/7378 [3:59:41<21:06:06, 12.22s/it] + +{'loss': 0.4936, 'learning_rate': 1.9158753946687104e-05, 'epoch': 0.16} + + 16%|█▌ | 1163/7378 [3:59:41<21:06:06, 12.22s/it] + 16%|█▌ | 1164/7378 [3:59:53<21:17:27, 12.33s/it] + +{'loss': 0.442, 'learning_rate': 1.9156990582139276e-05, 'epoch': 0.16} + + 16%|█▌ | 1164/7378 [3:59:53<21:17:27, 12.33s/it] + 16%|█▌ | 1165/7378 [4:00:06<21:16:54, 12.33s/it] + +{'loss': 0.5233, 'learning_rate': 1.915522545272502e-05, 'epoch': 0.16} + + 16%|█▌ | 1165/7378 [4:00:06<21:16:54, 12.33s/it] + 16%|█▌ | 1166/7378 [4:00:18<21:09:43, 12.26s/it] + +{'loss': 0.5303, 'learning_rate': 1.9153458558784536e-05, 'epoch': 0.16} + + 16%|█▌ | 1166/7378 [4:00:18<21:09:43, 12.26s/it] + 16%|█▌ | 1167/7378 [4:00:30<21:23:28, 12.40s/it] + +{'loss': 0.5445, 'learning_rate': 1.915168990065836e-05, 'epoch': 0.16} + + 16%|█▌ | 1167/7378 [4:00:30<21:23:28, 12.40s/it] + 16%|█▌ | 1168/7378 [4:00:42<21:10:23, 12.27s/it] + +{'loss': 0.5529, 'learning_rate': 1.9149919478687378e-05, 'epoch': 0.16} + + 16%|█▌ | 1168/7378 [4:00:42<21:10:23, 12.27s/it] + 16%|█▌ | 1169/7378 [4:00:55<21:11:00, 12.28s/it] + +{'loss': 0.5054, 'learning_rate': 1.9148147293212817e-05, 'epoch': 0.16} + + 16%|█▌ | 1169/7378 [4:00:55<21:11:00, 12.28s/it] + 16%|█▌ | 1170/7378 [4:01:07<21:01:56, 12.20s/it] + +{'loss': 0.4735, 'learning_rate': 1.914637334457623e-05, 'epoch': 0.16} + + 16%|█▌ | 1170/7378 [4:01:07<21:01:56, 12.20s/it] + 16%|█▌ | 1171/7378 [4:01:19<21:14:25, 12.32s/it] + +{'loss': 0.466, 'learning_rate': 1.9144597633119518e-05, 'epoch': 0.16} + + 16%|█▌ | 1171/7378 [4:01:19<21:14:25, 12.32s/it] + 16%|█▌ | 1172/7378 [4:01:31<21:11:34, 12.29s/it] + +{'loss': 0.54, 'learning_rate': 1.914282015918493e-05, 'epoch': 0.16} + + 16%|█▌ | 1172/7378 [4:01:32<21:11:34, 12.29s/it] + 16%|█▌ | 1173/7378 [4:01:44<21:32:50, 12.50s/it] + +{'loss': 0.461, 'learning_rate': 1.9141040923115034e-05, 'epoch': 0.16} + + 16%|█▌ | 1173/7378 [4:01:45<21:32:50, 12.50s/it] + 16%|█▌ | 1174/7378 [4:01:57<21:28:27, 12.46s/it] + +{'loss': 0.5729, 'learning_rate': 1.9139259925252756e-05, 'epoch': 0.16} + + 16%|█▌ | 1174/7378 [4:01:57<21:28:27, 12.46s/it] + 16%|█▌ | 1175/7378 [4:02:09<21:21:45, 12.40s/it] + +{'loss': 0.4884, 'learning_rate': 1.9137477165941355e-05, 'epoch': 0.16} + + 16%|█▌ | 1175/7378 [4:02:09<21:21:45, 12.40s/it] + 16%|█▌ | 1176/7378 [4:02:21<21:20:00, 12.38s/it] + +{'loss': 0.5299, 'learning_rate': 1.913569264552443e-05, 'epoch': 0.16} + + 16%|█▌ | 1176/7378 [4:02:21<21:20:00, 12.38s/it] + 16%|█▌ | 1177/7378 [4:02:34<21:19:16, 12.38s/it] + +{'loss': 0.5272, 'learning_rate': 1.913390636434592e-05, 'epoch': 0.16} + + 16%|█▌ | 1177/7378 [4:02:34<21:19:16, 12.38s/it] + 16%|█▌ | 1178/7378 [4:02:46<21:12:35, 12.32s/it] + +{'loss': 0.517, 'learning_rate': 1.9132118322750106e-05, 'epoch': 0.16} + + 16%|█▌ | 1178/7378 [4:02:46<21:12:35, 12.32s/it] + 16%|█▌ | 1179/7378 [4:02:59<21:39:30, 12.58s/it] + +{'loss': 0.5463, 'learning_rate': 1.9130328521081596e-05, 'epoch': 0.16} + + 16%|█▌ | 1179/7378 [4:02:59<21:39:30, 12.58s/it] + 16%|█▌ | 1180/7378 [4:03:11<21:25:29, 12.44s/it] + +{'loss': 0.4719, 'learning_rate': 1.912853695968535e-05, 'epoch': 0.16} + + 16%|█▌ | 1180/7378 [4:03:11<21:25:29, 12.44s/it] + 16%|█▌ | 1181/7378 [4:03:24<21:25:18, 12.44s/it] + +{'loss': 0.5253, 'learning_rate': 1.912674363890667e-05, 'epoch': 0.16} + + 16%|█▌ | 1181/7378 [4:03:24<21:25:18, 12.44s/it] + 16%|█▌ | 1182/7378 [4:03:36<21:25:32, 12.45s/it] + +{'loss': 0.4563, 'learning_rate': 1.912494855909118e-05, 'epoch': 0.16} + + 16%|█▌ | 1182/7378 [4:03:36<21:25:32, 12.45s/it] + 16%|█▌ | 1183/7378 [4:03:49<21:23:18, 12.43s/it] + +{'loss': 0.5171, 'learning_rate': 1.9123151720584863e-05, 'epoch': 0.16} + + 16%|█▌ | 1183/7378 [4:03:49<21:23:18, 12.43s/it] + 16%|█▌ | 1184/7378 [4:04:01<21:19:26, 12.39s/it] + +{'loss': 0.4526, 'learning_rate': 1.9121353123734025e-05, 'epoch': 0.16} + + 16%|█▌ | 1184/7378 [4:04:01<21:19:26, 12.39s/it] + 16%|█▌ | 1185/7378 [4:04:14<21:29:17, 12.49s/it] + +{'loss': 0.5153, 'learning_rate': 1.9119552768885323e-05, 'epoch': 0.16} + + 16%|█▌ | 1185/7378 [4:04:14<21:29:17, 12.49s/it] + 16%|█▌ | 1186/7378 [4:04:26<21:39:58, 12.60s/it] + +{'loss': 0.502, 'learning_rate': 1.9117750656385738e-05, 'epoch': 0.16} + + 16%|█▌ | 1186/7378 [4:04:26<21:39:58, 12.60s/it] + 16%|█▌ | 1187/7378 [4:04:39<21:30:17, 12.50s/it] + +{'loss': 0.5206, 'learning_rate': 1.9115946786582605e-05, 'epoch': 0.16} + + 16%|█▌ | 1187/7378 [4:04:39<21:30:17, 12.50s/it] + 16%|█▌ | 1188/7378 [4:04:51<21:19:39, 12.40s/it] + +{'loss': 0.5269, 'learning_rate': 1.9114141159823597e-05, 'epoch': 0.16} + + 16%|█▌ | 1188/7378 [4:04:51<21:19:39, 12.40s/it] + 16%|█▌ | 1189/7378 [4:05:03<21:21:58, 12.43s/it] + +{'loss': 0.5256, 'learning_rate': 1.911233377645671e-05, 'epoch': 0.16} + + 16%|█▌ | 1189/7378 [4:05:03<21:21:58, 12.43s/it] + 16%|█▌ | 1190/7378 [4:05:15<21:09:19, 12.31s/it] + +{'loss': 0.4577, 'learning_rate': 1.911052463683029e-05, 'epoch': 0.16} + + 16%|█▌ | 1190/7378 [4:05:15<21:09:19, 12.31s/it] + 16%|█▌ | 1191/7378 [4:05:28<21:09:48, 12.31s/it] + +{'loss': 0.5038, 'learning_rate': 1.910871374129303e-05, 'epoch': 0.16} + + 16%|█▌ | 1191/7378 [4:05:28<21:09:48, 12.31s/it] + 16%|█▌ | 1192/7378 [4:05:40<21:10:19, 12.32s/it] + +{'loss': 0.5035, 'learning_rate': 1.9106901090193943e-05, 'epoch': 0.16} + + 16%|█▌ | 1192/7378 [4:05:40<21:10:19, 12.32s/it] + 16%|█▌ | 1193/7378 [4:05:53<21:22:25, 12.44s/it] + +{'loss': 0.494, 'learning_rate': 1.9105086683882394e-05, 'epoch': 0.16} + + 16%|█▌ | 1193/7378 [4:05:53<21:22:25, 12.44s/it] + 16%|█▌ | 1194/7378 [4:06:06<21:30:15, 12.52s/it] + +{'loss': 0.4721, 'learning_rate': 1.9103270522708072e-05, 'epoch': 0.16} + + 16%|█▌ | 1194/7378 [4:06:06<21:30:15, 12.52s/it] + 16%|█▌ | 1195/7378 [4:06:18<21:23:29, 12.46s/it] + +{'loss': 0.5163, 'learning_rate': 1.9101452607021027e-05, 'epoch': 0.16} + + 16%|█▌ | 1195/7378 [4:06:18<21:23:29, 12.46s/it] + 16%|█▌ | 1196/7378 [4:06:30<21:16:21, 12.39s/it] + +{'loss': 0.5077, 'learning_rate': 1.9099632937171625e-05, 'epoch': 0.16} + + 16%|█▌ | 1196/7378 [4:06:30<21:16:21, 12.39s/it] + 16%|█▌ | 1197/7378 [4:06:42<21:04:22, 12.27s/it] + +{'loss': 0.5319, 'learning_rate': 1.909781151351058e-05, 'epoch': 0.16} + + 16%|█▌ | 1197/7378 [4:06:42<21:04:22, 12.27s/it] + 16%|█▌ | 1198/7378 [4:06:55<21:09:07, 12.32s/it] + +{'loss': 0.5407, 'learning_rate': 1.9095988336388945e-05, 'epoch': 0.16} + + 16%|█▌ | 1198/7378 [4:06:55<21:09:07, 12.32s/it] + 16%|█▋ | 1199/7378 [4:07:07<21:13:19, 12.36s/it] + +{'loss': 0.5184, 'learning_rate': 1.9094163406158105e-05, 'epoch': 0.16} + + 16%|█▋ | 1199/7378 [4:07:07<21:13:19, 12.36s/it] + 16%|█▋ | 1200/7378 [4:07:20<21:24:41, 12.48s/it] + +{'loss': 0.4622, 'learning_rate': 1.9092336723169793e-05, 'epoch': 0.16} + + 16%|█▋ | 1200/7378 [4:07:20<21:24:41, 12.48s/it] + 16%|█▋ | 1201/7378 [4:07:32<21:10:16, 12.34s/it] + +{'loss': 0.5432, 'learning_rate': 1.9090508287776067e-05, 'epoch': 0.16} + + 16%|█▋ | 1201/7378 [4:07:32<21:10:16, 12.34s/it] + 16%|█▋ | 1202/7378 [4:07:44<21:06:35, 12.31s/it] + +{'loss': 0.5102, 'learning_rate': 1.9088678100329337e-05, 'epoch': 0.16} + + 16%|█▋ | 1202/7378 [4:07:44<21:06:35, 12.31s/it] + 16%|█▋ | 1203/7378 [4:07:56<21:05:24, 12.30s/it] + +{'loss': 0.4521, 'learning_rate': 1.9086846161182334e-05, 'epoch': 0.16} + + 16%|█▋ | 1203/7378 [4:07:56<21:05:24, 12.30s/it] + 16%|█▋ | 1204/7378 [4:08:08<21:02:26, 12.27s/it] + +{'loss': 0.5205, 'learning_rate': 1.9085012470688142e-05, 'epoch': 0.16} + + 16%|█▋ | 1204/7378 [4:08:08<21:02:26, 12.27s/it] + 16%|█▋ | 1205/7378 [4:08:21<21:12:47, 12.37s/it] + +{'loss': 0.5452, 'learning_rate': 1.9083177029200174e-05, 'epoch': 0.16} + + 16%|█▋ | 1205/7378 [4:08:21<21:12:47, 12.37s/it] + 16%|█▋ | 1206/7378 [4:08:34<21:19:12, 12.44s/it] + +{'loss': 0.4571, 'learning_rate': 1.908133983707218e-05, 'epoch': 0.16} + + 16%|█▋ | 1206/7378 [4:08:34<21:19:12, 12.44s/it] + 16%|█▋ | 1207/7378 [4:08:46<21:18:28, 12.43s/it] + +{'loss': 0.4557, 'learning_rate': 1.907950089465825e-05, 'epoch': 0.16} + + 16%|█▋ | 1207/7378 [4:08:46<21:18:28, 12.43s/it] + 16%|█▋ | 1208/7378 [4:08:59<21:30:40, 12.55s/it] + +{'loss': 0.6019, 'learning_rate': 1.907766020231282e-05, 'epoch': 0.16} + + 16%|█▋ | 1208/7378 [4:08:59<21:30:40, 12.55s/it] + 16%|█▋ | 1209/7378 [4:09:11<21:16:33, 12.42s/it] + +{'loss': 0.3863, 'learning_rate': 1.9075817760390646e-05, 'epoch': 0.16} + + 16%|█▋ | 1209/7378 [4:09:11<21:16:33, 12.42s/it] + 16%|█▋ | 1210/7378 [4:09:24<21:20:21, 12.45s/it] + +{'loss': 0.4797, 'learning_rate': 1.9073973569246832e-05, 'epoch': 0.16} + + 16%|█▋ | 1210/7378 [4:09:24<21:20:21, 12.45s/it] + 16%|█▋ | 1211/7378 [4:09:36<21:21:42, 12.47s/it] + +{'loss': 0.497, 'learning_rate': 1.9072127629236816e-05, 'epoch': 0.16} + + 16%|█▋ | 1211/7378 [4:09:36<21:21:42, 12.47s/it] + 16%|█▋ | 1212/7378 [4:09:48<21:10:12, 12.36s/it] + +{'loss': 0.4922, 'learning_rate': 1.9070279940716375e-05, 'epoch': 0.16} + + 16%|█▋ | 1212/7378 [4:09:48<21:10:12, 12.36s/it] + 16%|█▋ | 1213/7378 [4:10:01<21:26:31, 12.52s/it] + +{'loss': 0.4916, 'learning_rate': 1.9068430504041623e-05, 'epoch': 0.16} + + 16%|█▋ | 1213/7378 [4:10:01<21:26:31, 12.52s/it] + 16%|█▋ | 1214/7378 [4:10:14<21:27:54, 12.54s/it] + +{'loss': 0.5273, 'learning_rate': 1.906657931956901e-05, 'epoch': 0.16} + + 16%|█▋ | 1214/7378 [4:10:14<21:27:54, 12.54s/it] + 16%|█▋ | 1215/7378 [4:10:25<21:06:30, 12.33s/it] + +{'loss': 0.523, 'learning_rate': 1.906472638765532e-05, 'epoch': 0.16} + + 16%|█▋ | 1215/7378 [4:10:25<21:06:30, 12.33s/it] + 16%|█▋ | 1216/7378 [4:10:38<21:10:34, 12.37s/it] + +{'loss': 0.4922, 'learning_rate': 1.9062871708657678e-05, 'epoch': 0.16} + + 16%|█▋ | 1216/7378 [4:10:38<21:10:34, 12.37s/it] + 16%|█▋ | 1217/7378 [4:10:51<21:17:48, 12.44s/it] + +{'loss': 0.4921, 'learning_rate': 1.906101528293355e-05, 'epoch': 0.16} + + 16%|█▋ | 1217/7378 [4:10:51<21:17:48, 12.44s/it] + 17%|█▋ | 1218/7378 [4:11:03<21:13:21, 12.40s/it] + +{'loss': 0.5518, 'learning_rate': 1.905915711084072e-05, 'epoch': 0.17} + + 17%|█▋ | 1218/7378 [4:11:03<21:13:21, 12.40s/it] + 17%|█▋ | 1219/7378 [4:11:15<21:09:07, 12.36s/it] + +{'loss': 0.45, 'learning_rate': 1.9057297192737332e-05, 'epoch': 0.17} + + 17%|█▋ | 1219/7378 [4:11:15<21:09:07, 12.36s/it] + 17%|█▋ | 1220/7378 [4:11:28<21:09:57, 12.37s/it] + +{'loss': 0.4728, 'learning_rate': 1.9055435528981857e-05, 'epoch': 0.17} + + 17%|█▋ | 1220/7378 [4:11:28<21:09:57, 12.37s/it] + 17%|█▋ | 1221/7378 [4:11:40<21:16:22, 12.44s/it] + +{'loss': 0.5196, 'learning_rate': 1.9053572119933093e-05, 'epoch': 0.17} + + 17%|█▋ | 1221/7378 [4:11:40<21:16:22, 12.44s/it] + 17%|█▋ | 1222/7378 [4:11:52<21:14:55, 12.43s/it] + +{'loss': 0.5245, 'learning_rate': 1.9051706965950192e-05, 'epoch': 0.17} + + 17%|█▋ | 1222/7378 [4:11:52<21:14:55, 12.43s/it] + 17%|█▋ | 1223/7378 [4:12:05<21:23:53, 12.52s/it] + +{'loss': 0.4642, 'learning_rate': 1.9049840067392626e-05, 'epoch': 0.17} + + 17%|█▋ | 1223/7378 [4:12:05<21:23:53, 12.52s/it] + 17%|█▋ | 1224/7378 [4:12:17<21:10:39, 12.39s/it] + +{'loss': 0.4729, 'learning_rate': 1.9047971424620214e-05, 'epoch': 0.17} + + 17%|█▋ | 1224/7378 [4:12:17<21:10:39, 12.39s/it] + 17%|█▋ | 1225/7378 [4:12:30<21:28:48, 12.57s/it] + +{'loss': 0.5378, 'learning_rate': 1.9046101037993107e-05, 'epoch': 0.17} + + 17%|█▋ | 1225/7378 [4:12:30<21:28:48, 12.57s/it] + 17%|█▋ | 1226/7378 [4:12:43<21:19:59, 12.48s/it] + +{'loss': 0.4999, 'learning_rate': 1.904422890787179e-05, 'epoch': 0.17} + + 17%|█▋ | 1226/7378 [4:12:43<21:19:59, 12.48s/it] + 17%|█▋ | 1227/7378 [4:12:56<21:39:46, 12.68s/it] + +{'loss': 0.5455, 'learning_rate': 1.9042355034617094e-05, 'epoch': 0.17} + + 17%|█▋ | 1227/7378 [4:12:56<21:39:46, 12.68s/it] + 17%|█▋ | 1228/7378 [4:13:08<21:23:21, 12.52s/it] + +{'loss': 0.4898, 'learning_rate': 1.904047941859017e-05, 'epoch': 0.17} + + 17%|█▋ | 1228/7378 [4:13:08<21:23:21, 12.52s/it] + 17%|█▋ | 1229/7378 [4:13:20<21:20:16, 12.49s/it] + +{'loss': 0.5218, 'learning_rate': 1.903860206015252e-05, 'epoch': 0.17} + + 17%|█▋ | 1229/7378 [4:13:20<21:20:16, 12.49s/it] + 17%|█▋ | 1230/7378 [4:13:33<21:17:48, 12.47s/it] + +{'loss': 0.443, 'learning_rate': 1.9036722959665975e-05, 'epoch': 0.17} + + 17%|█▋ | 1230/7378 [4:13:33<21:17:48, 12.47s/it] + 17%|█▋ | 1231/7378 [4:13:45<21:00:18, 12.30s/it] + +{'loss': 0.4708, 'learning_rate': 1.9034842117492697e-05, 'epoch': 0.17} + + 17%|█▋ | 1231/7378 [4:13:45<21:00:18, 12.30s/it] + 17%|█▋ | 1232/7378 [4:13:57<20:54:23, 12.25s/it] + +{'loss': 0.4941, 'learning_rate': 1.9032959533995194e-05, 'epoch': 0.17} + + 17%|█▋ | 1232/7378 [4:13:57<20:54:23, 12.25s/it] + 17%|█▋ | 1233/7378 [4:14:09<20:54:50, 12.25s/it] + +{'loss': 0.4314, 'learning_rate': 1.90310752095363e-05, 'epoch': 0.17} + + 17%|█▋ | 1233/7378 [4:14:09<20:54:50, 12.25s/it] + 17%|█▋ | 1234/7378 [4:14:21<20:53:22, 12.24s/it] + +{'loss': 0.4821, 'learning_rate': 1.9029189144479193e-05, 'epoch': 0.17} + + 17%|█▋ | 1234/7378 [4:14:21<20:53:22, 12.24s/it] + 17%|█▋ | 1235/7378 [4:14:33<20:45:31, 12.17s/it] + +{'loss': 0.5528, 'learning_rate': 1.9027301339187384e-05, 'epoch': 0.17} + + 17%|█▋ | 1235/7378 [4:14:33<20:45:31, 12.17s/it] + 17%|█▋ | 1236/7378 [4:14:46<20:52:04, 12.23s/it] + +{'loss': 0.5131, 'learning_rate': 1.902541179402471e-05, 'epoch': 0.17} + + 17%|█▋ | 1236/7378 [4:14:46<20:52:04, 12.23s/it] + 17%|█▋ | 1237/7378 [4:14:58<20:52:07, 12.23s/it] + +{'loss': 0.4613, 'learning_rate': 1.902352050935536e-05, 'epoch': 0.17} + + 17%|█▋ | 1237/7378 [4:14:58<20:52:07, 12.23s/it] + 17%|█▋ | 1238/7378 [4:15:11<21:07:49, 12.39s/it] + +{'loss': 0.5063, 'learning_rate': 1.9021627485543844e-05, 'epoch': 0.17} + + 17%|█▋ | 1238/7378 [4:15:11<21:07:49, 12.39s/it] + 17%|█▋ | 1239/7378 [4:15:23<21:05:25, 12.37s/it] + +{'loss': 0.5077, 'learning_rate': 1.901973272295502e-05, 'epoch': 0.17} + + 17%|█▋ | 1239/7378 [4:15:23<21:05:25, 12.37s/it] + 17%|█▋ | 1240/7378 [4:15:35<20:54:24, 12.26s/it] + +{'loss': 0.4959, 'learning_rate': 1.9017836221954062e-05, 'epoch': 0.17} + + 17%|█▋ | 1240/7378 [4:15:35<20:54:24, 12.26s/it] + 17%|█▋ | 1241/7378 [4:15:47<20:52:04, 12.24s/it] + +{'loss': 0.5592, 'learning_rate': 1.9015937982906495e-05, 'epoch': 0.17} + + 17%|█▋ | 1241/7378 [4:15:47<20:52:04, 12.24s/it] + 17%|█▋ | 1242/7378 [4:15:59<20:50:06, 12.22s/it] + +{'loss': 0.4917, 'learning_rate': 1.9014038006178182e-05, 'epoch': 0.17} + + 17%|█▋ | 1242/7378 [4:15:59<20:50:06, 12.22s/it] + 17%|█▋ | 1243/7378 [4:16:11<20:49:00, 12.22s/it] + +{'loss': 0.4652, 'learning_rate': 1.9012136292135306e-05, 'epoch': 0.17} + + 17%|█▋ | 1243/7378 [4:16:11<20:49:00, 12.22s/it] + 17%|█▋ | 1244/7378 [4:16:24<20:50:45, 12.23s/it] + +{'loss': 0.4509, 'learning_rate': 1.9010232841144395e-05, 'epoch': 0.17} + + 17%|█▋ | 1244/7378 [4:16:24<20:50:45, 12.23s/it] + 17%|█▋ | 1245/7378 [4:16:35<20:34:29, 12.08s/it] + +{'loss': 0.4921, 'learning_rate': 1.900832765357231e-05, 'epoch': 0.17} + + 17%|█▋ | 1245/7378 [4:16:35<20:34:29, 12.08s/it] + 17%|█▋ | 1246/7378 [4:16:48<20:37:40, 12.11s/it] + +{'loss': 0.4229, 'learning_rate': 1.9006420729786246e-05, 'epoch': 0.17} + + 17%|█▋ | 1246/7378 [4:16:48<20:37:40, 12.11s/it] + 17%|█▋ | 1247/7378 [4:17:00<20:43:29, 12.17s/it] + +{'loss': 0.4933, 'learning_rate': 1.900451207015373e-05, 'epoch': 0.17} + + 17%|█▋ | 1247/7378 [4:17:00<20:43:29, 12.17s/it] + 17%|█▋ | 1248/7378 [4:17:12<20:44:44, 12.18s/it] + +{'loss': 0.5228, 'learning_rate': 1.9002601675042632e-05, 'epoch': 0.17} + + 17%|█▋ | 1248/7378 [4:17:12<20:44:44, 12.18s/it] + 17%|█▋ | 1249/7378 [4:17:25<21:07:46, 12.41s/it] + +{'loss': 0.4969, 'learning_rate': 1.9000689544821145e-05, 'epoch': 0.17} + + 17%|█▋ | 1249/7378 [4:17:25<21:07:46, 12.41s/it] + 17%|█▋ | 1250/7378 [4:17:38<21:18:02, 12.51s/it] + +{'loss': 0.5095, 'learning_rate': 1.8998775679857805e-05, 'epoch': 0.17} + + 17%|█▋ | 1250/7378 [4:17:38<21:18:02, 12.51s/it] + 17%|█▋ | 1251/7378 [4:17:50<21:10:20, 12.44s/it] + +{'loss': 0.4765, 'learning_rate': 1.8996860080521478e-05, 'epoch': 0.17} + + 17%|█▋ | 1251/7378 [4:17:50<21:10:20, 12.44s/it] + 17%|█▋ | 1252/7378 [4:18:03<21:22:13, 12.56s/it] + +{'loss': 0.4983, 'learning_rate': 1.8994942747181368e-05, 'epoch': 0.17} + + 17%|█▋ | 1252/7378 [4:18:03<21:22:13, 12.56s/it] + 17%|█▋ | 1253/7378 [4:18:15<21:12:25, 12.46s/it] + +{'loss': 0.5368, 'learning_rate': 1.899302368020701e-05, 'epoch': 0.17} + + 17%|█▋ | 1253/7378 [4:18:15<21:12:25, 12.46s/it] + 17%|█▋ | 1254/7378 [4:18:28<21:08:08, 12.42s/it] + +{'loss': 0.4799, 'learning_rate': 1.899110287996827e-05, 'epoch': 0.17} + + 17%|█▋ | 1254/7378 [4:18:28<21:08:08, 12.42s/it] + 17%|█▋ | 1255/7378 [4:18:40<20:56:05, 12.31s/it] + +{'loss': 0.5153, 'learning_rate': 1.8989180346835356e-05, 'epoch': 0.17} + + 17%|█▋ | 1255/7378 [4:18:40<20:56:05, 12.31s/it] + 17%|█▋ | 1256/7378 [4:18:52<20:48:48, 12.24s/it] + +{'loss': 0.5269, 'learning_rate': 1.8987256081178808e-05, 'epoch': 0.17} + + 17%|█▋ | 1256/7378 [4:18:52<20:48:48, 12.24s/it] + 17%|█▋ | 1257/7378 [4:19:04<20:50:38, 12.26s/it] + +{'loss': 0.5469, 'learning_rate': 1.8985330083369494e-05, 'epoch': 0.17} + + 17%|█▋ | 1257/7378 [4:19:04<20:50:38, 12.26s/it] + 17%|█▋ | 1258/7378 [4:19:16<20:51:50, 12.27s/it] + +{'loss': 0.476, 'learning_rate': 1.8983402353778625e-05, 'epoch': 0.17} + + 17%|█▋ | 1258/7378 [4:19:16<20:51:50, 12.27s/it] + 17%|█▋ | 1259/7378 [4:19:29<20:51:23, 12.27s/it] + +{'loss': 0.5513, 'learning_rate': 1.8981472892777735e-05, 'epoch': 0.17} + + 17%|█▋ | 1259/7378 [4:19:29<20:51:23, 12.27s/it] + 17%|█▋ | 1260/7378 [4:19:41<20:47:42, 12.24s/it] + +{'loss': 0.5579, 'learning_rate': 1.89795417007387e-05, 'epoch': 0.17} + + 17%|█▋ | 1260/7378 [4:19:41<20:47:42, 12.24s/it] + 17%|█▋ | 1261/7378 [4:19:53<20:52:27, 12.28s/it] + +{'loss': 0.4645, 'learning_rate': 1.8977608778033726e-05, 'epoch': 0.17} + + 17%|█▋ | 1261/7378 [4:19:53<20:52:27, 12.28s/it] + 17%|█▋ | 1262/7378 [4:20:05<20:49:23, 12.26s/it] + +{'loss': 0.5615, 'learning_rate': 1.897567412503536e-05, 'epoch': 0.17} + + 17%|█▋ | 1262/7378 [4:20:05<20:49:23, 12.26s/it] + 17%|█▋ | 1263/7378 [4:20:17<20:44:08, 12.21s/it] + +{'loss': 0.4555, 'learning_rate': 1.8973737742116464e-05, 'epoch': 0.17} + + 17%|█▋ | 1263/7378 [4:20:17<20:44:08, 12.21s/it] + 17%|█▋ | 1264/7378 [4:20:30<20:50:30, 12.27s/it] + +{'loss': 0.4106, 'learning_rate': 1.8971799629650253e-05, 'epoch': 0.17} + + 17%|█▋ | 1264/7378 [4:20:30<20:50:30, 12.27s/it] + 17%|█▋ | 1265/7378 [4:20:42<20:55:31, 12.32s/it] + +{'loss': 0.5032, 'learning_rate': 1.896985978801027e-05, 'epoch': 0.17} + + 17%|█▋ | 1265/7378 [4:20:42<20:55:31, 12.32s/it] + 17%|█▋ | 1266/7378 [4:20:55<21:08:07, 12.45s/it] + +{'loss': 0.5608, 'learning_rate': 1.896791821757038e-05, 'epoch': 0.17} + + 17%|█▋ | 1266/7378 [4:20:55<21:08:07, 12.45s/it] + 17%|█▋ | 1267/7378 [4:21:07<21:02:38, 12.40s/it] + +{'loss': 0.4742, 'learning_rate': 1.8965974918704803e-05, 'epoch': 0.17} + + 17%|█▋ | 1267/7378 [4:21:07<21:02:38, 12.40s/it] + 17%|█▋ | 1268/7378 [4:21:20<21:06:14, 12.43s/it] + +{'loss': 0.586, 'learning_rate': 1.8964029891788067e-05, 'epoch': 0.17} + + 17%|█▋ | 1268/7378 [4:21:20<21:06:14, 12.43s/it] + 17%|█▋ | 1269/7378 [4:21:32<21:08:15, 12.46s/it] + +{'loss': 0.5112, 'learning_rate': 1.8962083137195054e-05, 'epoch': 0.17} + + 17%|█▋ | 1269/7378 [4:21:32<21:08:15, 12.46s/it] + 17%|█▋ | 1270/7378 [4:21:44<20:57:26, 12.35s/it] + +{'loss': 0.4862, 'learning_rate': 1.8960134655300966e-05, 'epoch': 0.17} + + 17%|█▋ | 1270/7378 [4:21:44<20:57:26, 12.35s/it] + 17%|█▋ | 1271/7378 [4:21:57<20:49:50, 12.28s/it] + +{'loss': 0.484, 'learning_rate': 1.8958184446481343e-05, 'epoch': 0.17} + + 17%|█▋ | 1271/7378 [4:21:57<20:49:50, 12.28s/it] + 17%|█▋ | 1272/7378 [4:22:09<20:44:25, 12.23s/it] + +{'loss': 0.4199, 'learning_rate': 1.8956232511112058e-05, 'epoch': 0.17} + + 17%|█▋ | 1272/7378 [4:22:09<20:44:25, 12.23s/it] + 17%|█▋ | 1273/7378 [4:22:21<20:36:54, 12.16s/it] + +{'loss': 0.5356, 'learning_rate': 1.895427884956932e-05, 'epoch': 0.17} + + 17%|█▋ | 1273/7378 [4:22:21<20:36:54, 12.16s/it] + 17%|█▋ | 1274/7378 [4:22:33<20:38:22, 12.17s/it] + +{'loss': 0.4871, 'learning_rate': 1.8952323462229658e-05, 'epoch': 0.17} + + 17%|█▋ | 1274/7378 [4:22:33<20:38:22, 12.17s/it] + 17%|█▋ | 1275/7378 [4:22:45<20:49:18, 12.28s/it] + +{'loss': 0.4664, 'learning_rate': 1.895036634946995e-05, 'epoch': 0.17} + + 17%|█▋ | 1275/7378 [4:22:45<20:49:18, 12.28s/it] + 17%|█▋ | 1276/7378 [4:22:58<20:44:56, 12.24s/it] + +{'loss': 0.5433, 'learning_rate': 1.894840751166739e-05, 'epoch': 0.17} + + 17%|█▋ | 1276/7378 [4:22:58<20:44:56, 12.24s/it] + 17%|█▋ | 1277/7378 [4:23:10<20:43:17, 12.23s/it] + +{'loss': 0.4726, 'learning_rate': 1.8946446949199525e-05, 'epoch': 0.17} + + 17%|█▋ | 1277/7378 [4:23:10<20:43:17, 12.23s/it] + 17%|█▋ | 1278/7378 [4:23:22<20:42:45, 12.22s/it] + +{'loss': 0.4867, 'learning_rate': 1.894448466244421e-05, 'epoch': 0.17} + + 17%|█▋ | 1278/7378 [4:23:22<20:42:45, 12.22s/it] + 17%|█▋ | 1279/7378 [4:23:35<20:57:18, 12.37s/it] + +{'loss': 0.4428, 'learning_rate': 1.8942520651779657e-05, 'epoch': 0.17} + + 17%|█▋ | 1279/7378 [4:23:35<20:57:18, 12.37s/it] + 17%|█▋ | 1280/7378 [4:23:47<20:54:22, 12.34s/it] + +{'loss': 0.5711, 'learning_rate': 1.8940554917584392e-05, 'epoch': 0.17} + + 17%|█▋ | 1280/7378 [4:23:47<20:54:22, 12.34s/it] + 17%|█▋ | 1281/7378 [4:23:59<20:47:43, 12.28s/it] + +{'loss': 0.4812, 'learning_rate': 1.893858746023728e-05, 'epoch': 0.17} + + 17%|█▋ | 1281/7378 [4:23:59<20:47:43, 12.28s/it] + 17%|█▋ | 1282/7378 [4:24:11<20:45:52, 12.26s/it] + +{'loss': 0.5676, 'learning_rate': 1.8936618280117516e-05, 'epoch': 0.17} + + 17%|█▋ | 1282/7378 [4:24:11<20:45:52, 12.26s/it] + 17%|█▋ | 1283/7378 [4:24:24<20:45:16, 12.26s/it] + +{'loss': 0.5338, 'learning_rate': 1.893464737760463e-05, 'epoch': 0.17} + + 17%|█▋ | 1283/7378 [4:24:24<20:45:16, 12.26s/it] + 17%|█▋ | 1284/7378 [4:24:36<20:45:20, 12.26s/it] + +{'loss': 0.4417, 'learning_rate': 1.8932674753078485e-05, 'epoch': 0.17} + + 17%|█▋ | 1284/7378 [4:24:36<20:45:20, 12.26s/it] + 17%|█▋ | 1285/7378 [4:24:48<20:46:21, 12.27s/it] + +{'loss': 0.4871, 'learning_rate': 1.8930700406919275e-05, 'epoch': 0.17} + + 17%|█▋ | 1285/7378 [4:24:48<20:46:21, 12.27s/it] + 17%|█▋ | 1286/7378 [4:25:00<20:38:48, 12.20s/it] + +{'loss': 0.5116, 'learning_rate': 1.8928724339507515e-05, 'epoch': 0.17} + + 17%|█▋ | 1286/7378 [4:25:00<20:38:48, 12.20s/it] + 17%|█▋ | 1287/7378 [4:25:12<20:44:02, 12.25s/it] + +{'loss': 0.4845, 'learning_rate': 1.892674655122407e-05, 'epoch': 0.17} + + 17%|█▋ | 1287/7378 [4:25:13<20:44:02, 12.25s/it] + 17%|█▋ | 1288/7378 [4:25:25<20:43:45, 12.25s/it] + +{'loss': 0.4628, 'learning_rate': 1.8924767042450122e-05, 'epoch': 0.17} + + 17%|█▋ | 1288/7378 [4:25:25<20:43:45, 12.25s/it] + 17%|█▋ | 1289/7378 [4:25:37<20:40:24, 12.22s/it] + +{'loss': 0.5127, 'learning_rate': 1.8922785813567194e-05, 'epoch': 0.17} + + 17%|█▋ | 1289/7378 [4:25:37<20:40:24, 12.22s/it] + 17%|█▋ | 1290/7378 [4:25:49<20:41:34, 12.24s/it] + +{'loss': 0.4326, 'learning_rate': 1.8920802864957136e-05, 'epoch': 0.17} + + 17%|█▋ | 1290/7378 [4:25:49<20:41:34, 12.24s/it] + 17%|█▋ | 1291/7378 [4:26:02<20:59:03, 12.41s/it] + +{'loss': 0.4811, 'learning_rate': 1.891881819700213e-05, 'epoch': 0.17} + + 17%|█▋ | 1291/7378 [4:26:02<20:59:03, 12.41s/it] + 18%|█▊ | 1292/7378 [4:26:14<20:46:12, 12.29s/it] + +{'loss': 0.5071, 'learning_rate': 1.891683181008469e-05, 'epoch': 0.18} + + 18%|█▊ | 1292/7378 [4:26:14<20:46:12, 12.29s/it] + 18%|█▊ | 1293/7378 [4:26:27<20:55:00, 12.37s/it] + +{'loss': 0.5223, 'learning_rate': 1.891484370458766e-05, 'epoch': 0.18} + + 18%|█▊ | 1293/7378 [4:26:27<20:55:00, 12.37s/it] + 18%|█▊ | 1294/7378 [4:26:39<20:54:26, 12.37s/it] + +{'loss': 0.4555, 'learning_rate': 1.8912853880894215e-05, 'epoch': 0.18} + + 18%|█▊ | 1294/7378 [4:26:39<20:54:26, 12.37s/it] + 18%|█▊ | 1295/7378 [4:26:51<20:57:10, 12.40s/it] + +{'loss': 0.4684, 'learning_rate': 1.8910862339387865e-05, 'epoch': 0.18} + + 18%|█��� | 1295/7378 [4:26:51<20:57:10, 12.40s/it] + 18%|█▊ | 1296/7378 [4:27:04<20:52:59, 12.36s/it] + +{'loss': 0.5185, 'learning_rate': 1.890886908045245e-05, 'epoch': 0.18} + + 18%|█▊ | 1296/7378 [4:27:04<20:52:59, 12.36s/it] + 18%|█▊ | 1297/7378 [4:27:16<20:46:01, 12.29s/it] + +{'loss': 0.4456, 'learning_rate': 1.890687410447213e-05, 'epoch': 0.18} + + 18%|█▊ | 1297/7378 [4:27:16<20:46:01, 12.29s/it] + 18%|█▊ | 1298/7378 [4:27:28<20:42:23, 12.26s/it] + +{'loss': 0.4309, 'learning_rate': 1.890487741183142e-05, 'epoch': 0.18} + + 18%|█▊ | 1298/7378 [4:27:28<20:42:23, 12.26s/it] + 18%|█▊ | 1299/7378 [4:27:40<20:43:41, 12.28s/it] + +{'loss': 0.485, 'learning_rate': 1.890287900291514e-05, 'epoch': 0.18} + + 18%|█▊ | 1299/7378 [4:27:40<20:43:41, 12.28s/it] + 18%|█▊ | 1300/7378 [4:27:52<20:37:42, 12.22s/it] + +{'loss': 0.5085, 'learning_rate': 1.8900878878108452e-05, 'epoch': 0.18} + + 18%|█▊ | 1300/7378 [4:27:52<20:37:42, 12.22s/it] + 18%|█▊ | 1301/7378 [4:28:05<20:47:48, 12.32s/it] + +{'loss': 0.5022, 'learning_rate': 1.8898877037796856e-05, 'epoch': 0.18} + + 18%|█▊ | 1301/7378 [4:28:05<20:47:48, 12.32s/it] + 18%|█▊ | 1302/7378 [4:28:17<20:51:00, 12.35s/it] + +{'loss': 0.4782, 'learning_rate': 1.8896873482366173e-05, 'epoch': 0.18} + + 18%|█▊ | 1302/7378 [4:28:17<20:51:00, 12.35s/it] + 18%|█▊ | 1303/7378 [4:28:30<20:49:27, 12.34s/it] + +{'loss': 0.4602, 'learning_rate': 1.8894868212202553e-05, 'epoch': 0.18} + + 18%|█▊ | 1303/7378 [4:28:30<20:49:27, 12.34s/it] + 18%|█▊ | 1304/7378 [4:28:42<20:52:07, 12.37s/it] + +{'loss': 0.4944, 'learning_rate': 1.8892861227692485e-05, 'epoch': 0.18} + + 18%|█▊ | 1304/7378 [4:28:42<20:52:07, 12.37s/it] + 18%|█▊ | 1305/7378 [4:28:54<20:49:42, 12.35s/it] + +{'loss': 0.4498, 'learning_rate': 1.8890852529222778e-05, 'epoch': 0.18} + + 18%|█▊ | 1305/7378 [4:28:54<20:49:42, 12.35s/it] + 18%|█▊ | 1306/7378 [4:29:07<20:52:32, 12.38s/it] + +{'loss': 0.4439, 'learning_rate': 1.8888842117180584e-05, 'epoch': 0.18} + + 18%|█▊ | 1306/7378 [4:29:07<20:52:32, 12.38s/it] + 18%|█▊ | 1307/7378 [4:29:21<21:35:25, 12.80s/it] + +{'loss': 0.4809, 'learning_rate': 1.8886829991953372e-05, 'epoch': 0.18} + + 18%|█▊ | 1307/7378 [4:29:21<21:35:25, 12.80s/it] + 18%|█▊ | 1308/7378 [4:29:33<21:18:27, 12.64s/it] + +{'loss': 0.4701, 'learning_rate': 1.8884816153928953e-05, 'epoch': 0.18} + + 18%|█▊ | 1308/7378 [4:29:33<21:18:27, 12.64s/it] + 18%|█▊ | 1309/7378 [4:29:45<21:04:49, 12.50s/it] + +{'loss': 0.4891, 'learning_rate': 1.888280060349546e-05, 'epoch': 0.18} + + 18%|█▊ | 1309/7378 [4:29:45<21:04:49, 12.50s/it] + 18%|█▊ | 1310/7378 [4:29:57<20:57:31, 12.43s/it] + +{'loss': 0.4761, 'learning_rate': 1.8880783341041357e-05, 'epoch': 0.18} + + 18%|█▊ | 1310/7378 [4:29:57<20:57:31, 12.43s/it] + 18%|█▊ | 1311/7378 [4:30:10<20:58:21, 12.44s/it] + +{'loss': 0.5801, 'learning_rate': 1.8878764366955446e-05, 'epoch': 0.18} + + 18%|█▊ | 1311/7378 [4:30:10<20:58:21, 12.44s/it] + 18%|█▊ | 1312/7378 [4:30:23<21:05:11, 12.51s/it] + +{'loss': 0.5361, 'learning_rate': 1.8876743681626846e-05, 'epoch': 0.18} + + 18%|█▊ | 1312/7378 [4:30:23<21:05:11, 12.51s/it] + 18%|█▊ | 1313/7378 [4:30:35<21:03:52, 12.50s/it] + +{'loss': 0.5106, 'learning_rate': 1.8874721285445016e-05, 'epoch': 0.18} + + 18%|█▊ | 1313/7378 [4:30:35<21:03:52, 12.50s/it] + 18%|█▊ | 1314/7378 [4:30:48<21:07:33, 12.54s/it] + +{'loss': 0.4719, 'learning_rate': 1.887269717879974e-05, 'epoch': 0.18} + + 18%|█▊ | 1314/7378 [4:30:48<21:07:33, 12.54s/it] + 18%|█▊ | 1315/7378 [4:31:00<20:56:35, 12.44s/it] + +{'loss': 0.4827, 'learning_rate': 1.8870671362081133e-05, 'epoch': 0.18} + + 18%|█▊ | 1315/7378 [4:31:00<20:56:35, 12.44s/it] + 18%|█▊ | 1316/7378 [4:31:12<20:48:13, 12.35s/it] + +{'loss': 0.4959, 'learning_rate': 1.8868643835679638e-05, 'epoch': 0.18} + + 18%|█▊ | 1316/7378 [4:31:12<20:48:13, 12.35s/it] + 18%|█▊ | 1317/7378 [4:31:24<20:39:57, 12.27s/it] + +{'loss': 0.466, 'learning_rate': 1.8866614599986032e-05, 'epoch': 0.18} + + 18%|█▊ | 1317/7378 [4:31:24<20:39:57, 12.27s/it] + 18%|█▊ | 1318/7378 [4:31:36<20:38:36, 12.26s/it] + +{'loss': 0.4939, 'learning_rate': 1.8864583655391417e-05, 'epoch': 0.18} + + 18%|█▊ | 1318/7378 [4:31:36<20:38:36, 12.26s/it] + 18%|█▊ | 1319/7378 [4:31:49<20:45:26, 12.33s/it] + +{'loss': 0.529, 'learning_rate': 1.8862551002287223e-05, 'epoch': 0.18} + + 18%|█▊ | 1319/7378 [4:31:49<20:45:26, 12.33s/it] + 18%|█▊ | 1320/7378 [4:32:01<20:45:29, 12.34s/it] + +{'loss': 0.4727, 'learning_rate': 1.8860516641065218e-05, 'epoch': 0.18} + + 18%|█▊ | 1320/7378 [4:32:01<20:45:29, 12.34s/it] + 18%|█▊ | 1321/7378 [4:32:14<20:51:36, 12.40s/it] + +{'loss': 0.5567, 'learning_rate': 1.8858480572117485e-05, 'epoch': 0.18} + + 18%|█▊ | 1321/7378 [4:32:14<20:51:36, 12.40s/it] + 18%|█▊ | 1322/7378 [4:32:26<20:40:15, 12.29s/it] + +{'loss': 0.4541, 'learning_rate': 1.8856442795836453e-05, 'epoch': 0.18} + + 18%|█▊ | 1322/7378 [4:32:26<20:40:15, 12.29s/it] + 18%|█▊ | 1323/7378 [4:32:38<20:43:21, 12.32s/it] + +{'loss': 0.46, 'learning_rate': 1.885440331261487e-05, 'epoch': 0.18} + + 18%|█▊ | 1323/7378 [4:32:38<20:43:21, 12.32s/it] + 18%|█▊ | 1324/7378 [4:32:50<20:34:44, 12.24s/it] + +{'loss': 0.5202, 'learning_rate': 1.8852362122845807e-05, 'epoch': 0.18} + + 18%|█▊ | 1324/7378 [4:32:50<20:34:44, 12.24s/it] + 18%|█▊ | 1325/7378 [4:33:03<20:39:31, 12.29s/it] + +{'loss': 0.4512, 'learning_rate': 1.8850319226922678e-05, 'epoch': 0.18} + + 18%|█▊ | 1325/7378 [4:33:03<20:39:31, 12.29s/it] + 18%|█▊ | 1326/7378 [4:33:15<20:57:40, 12.47s/it] + +{'loss': 0.4665, 'learning_rate': 1.8848274625239216e-05, 'epoch': 0.18} + + 18%|█▊ | 1326/7378 [4:33:15<20:57:40, 12.47s/it] + 18%|█▊ | 1327/7378 [4:33:27<20:41:07, 12.31s/it] + +{'loss': 0.4976, 'learning_rate': 1.8846228318189488e-05, 'epoch': 0.18} + + 18%|█▊ | 1327/7378 [4:33:27<20:41:07, 12.31s/it] + 18%|█▊ | 1328/7378 [4:33:40<20:46:35, 12.36s/it] + +{'loss': 0.5207, 'learning_rate': 1.884418030616789e-05, 'epoch': 0.18} + + 18%|█▊ | 1328/7378 [4:33:40<20:46:35, 12.36s/it] + 18%|█▊ | 1329/7378 [4:33:52<20:44:19, 12.34s/it] + +{'loss': 0.4816, 'learning_rate': 1.8842130589569137e-05, 'epoch': 0.18} + + 18%|█▊ | 1329/7378 [4:33:52<20:44:19, 12.34s/it] + 18%|█▊ | 1330/7378 [4:34:04<20:39:37, 12.30s/it] + +{'loss': 0.547, 'learning_rate': 1.8840079168788288e-05, 'epoch': 0.18} + + 18%|█▊ | 1330/7378 [4:34:04<20:39:37, 12.30s/it] + 18%|█▊ | 1331/7378 [4:34:17<20:37:32, 12.28s/it] + +{'loss': 0.4494, 'learning_rate': 1.8838026044220716e-05, 'epoch': 0.18} + + 18%|█▊ | 1331/7378 [4:34:17<20:37:32, 12.28s/it] + 18%|█▊ | 1332/7378 [4:34:29<20:34:29, 12.25s/it] + +{'loss': 0.5031, 'learning_rate': 1.8835971216262132e-05, 'epoch': 0.18} + + 18%|█▊ | 1332/7378 [4:34:29<20:34:29, 12.25s/it] + 18%|█▊ | 1333/7378 [4:34:41<20:36:25, 12.27s/it] + +{'loss': 0.4553, 'learning_rate': 1.8833914685308568e-05, 'epoch': 0.18} + + 18%|█▊ | 1333/7378 [4:34:41<20:36:25, 12.27s/it] + 18%|█▊ | 1334/7378 [4:34:54<20:47:06, 12.38s/it] + +{'loss': 0.4843, 'learning_rate': 1.8831856451756394e-05, 'epoch': 0.18} + + 18%|█▊ | 1334/7378 [4:34:54<20:47:06, 12.38s/it] + 18%|█▊ | 1335/7378 [4:35:06<20:51:21, 12.42s/it] + +{'loss': 0.4912, 'learning_rate': 1.88297965160023e-05, 'epoch': 0.18} + + 18%|█▊ | 1335/7378 [4:35:06<20:51:21, 12.42s/it] + 18%|█▊ | 1336/7378 [4:35:18<20:45:14, 12.37s/it] + +{'loss': 0.5228, 'learning_rate': 1.8827734878443303e-05, 'epoch': 0.18} + + 18%|█▊ | 1336/7378 [4:35:19<20:45:14, 12.37s/it] + 18%|█▊ | 1337/7378 [4:35:31<20:40:14, 12.32s/it] + +{'loss': 0.499, 'learning_rate': 1.8825671539476754e-05, 'epoch': 0.18} + + 18%|█▊ | 1337/7378 [4:35:31<20:40:14, 12.32s/it] + 18%|█▊ | 1338/7378 [4:35:43<20:51:53, 12.44s/it] + +{'loss': 0.4953, 'learning_rate': 1.882360649950033e-05, 'epoch': 0.18} + + 18%|█▊ | 1338/7378 [4:35:43<20:51:53, 12.44s/it] + 18%|█▊ | 1339/7378 [4:35:56<20:44:43, 12.37s/it] + +{'loss': 0.4207, 'learning_rate': 1.8821539758912033e-05, 'epoch': 0.18} + + 18%|█▊ | 1339/7378 [4:35:56<20:44:43, 12.37s/it] + 18%|█▊ | 1340/7378 [4:36:08<20:54:34, 12.47s/it] + +{'loss': 0.523, 'learning_rate': 1.8819471318110195e-05, 'epoch': 0.18} + + 18%|█▊ | 1340/7378 [4:36:08<20:54:34, 12.47s/it] + 18%|█▊ | 1341/7378 [4:36:20<20:39:04, 12.31s/it] + +{'loss': 0.4463, 'learning_rate': 1.8817401177493477e-05, 'epoch': 0.18} + + 18%|█▊ | 1341/7378 [4:36:20<20:39:04, 12.31s/it] + 18%|█▊ | 1342/7378 [4:36:32<20:33:24, 12.26s/it] + +{'loss': 0.4731, 'learning_rate': 1.881532933746087e-05, 'epoch': 0.18} + + 18%|█▊ | 1342/7378 [4:36:32<20:33:24, 12.26s/it] + 18%|█▊ | 1343/7378 [4:36:45<20:31:11, 12.24s/it] + +{'loss': 0.4723, 'learning_rate': 1.8813255798411676e-05, 'epoch': 0.18} + + 18%|█▊ | 1343/7378 [4:36:45<20:31:11, 12.24s/it] + 18%|█▊ | 1344/7378 [4:36:57<20:26:22, 12.19s/it] + +{'loss': 0.4893, 'learning_rate': 1.881118056074555e-05, 'epoch': 0.18} + + 18%|█▊ | 1344/7378 [4:36:57<20:26:22, 12.19s/it] + 18%|█▊ | 1345/7378 [4:37:10<20:50:20, 12.44s/it] + +{'loss': 0.5041, 'learning_rate': 1.880910362486246e-05, 'epoch': 0.18} + + 18%|█▊ | 1345/7378 [4:37:10<20:50:20, 12.44s/it] + 18%|█▊ | 1346/7378 [4:37:22<20:47:44, 12.41s/it] + +{'loss': 0.5079, 'learning_rate': 1.88070249911627e-05, 'epoch': 0.18} + + 18%|█▊ | 1346/7378 [4:37:22<20:47:44, 12.41s/it] + 18%|█▊ | 1347/7378 [4:37:34<20:41:25, 12.35s/it] + +{'loss': 0.4963, 'learning_rate': 1.8804944660046887e-05, 'epoch': 0.18} + + 18%|█▊ | 1347/7378 [4:37:34<20:41:25, 12.35s/it] + 18%|█▊ | 1348/7378 [4:37:47<20:45:47, 12.40s/it] + +{'loss': 0.4296, 'learning_rate': 1.8802862631915983e-05, 'epoch': 0.18} + + 18%|█▊ | 1348/7378 [4:37:47<20:45:47, 12.40s/it] + 18%|█▊ | 1349/7378 [4:37:59<20:30:39, 12.25s/it] + +{'loss': 0.453, 'learning_rate': 1.8800778907171264e-05, 'epoch': 0.18} + + 18%|█▊ | 1349/7378 [4:37:59<20:30:39, 12.25s/it] + 18%|█▊ | 1350/7378 [4:38:11<20:29:16, 12.24s/it] + +{'loss': 0.4578, 'learning_rate': 1.879869348621433e-05, 'epoch': 0.18} + + 18%|█▊ | 1350/7378 [4:38:11<20:29:16, 12.24s/it] + 18%|█▊ | 1351/7378 [4:38:23<20:31:19, 12.26s/it] + +{'loss': 0.4536, 'learning_rate': 1.879660636944712e-05, 'epoch': 0.18} + + 18%|█▊ | 1351/7378 [4:38:23<20:31:19, 12.26s/it] + 18%|█▊ | 1352/7378 [4:38:35<20:24:00, 12.19s/it] + +{'loss': 0.4932, 'learning_rate': 1.879451755727189e-05, 'epoch': 0.18} + + 18%|█▊ | 1352/7378 [4:38:35<20:24:00, 12.19s/it] + 18%|█▊ | 1353/7378 [4:38:47<20:18:16, 12.13s/it] + +{'loss': 0.4728, 'learning_rate': 1.8792427050091225e-05, 'epoch': 0.18} + + 18%|█▊ | 1353/7378 [4:38:47<20:18:16, 12.13s/it] + 18%|█▊ | 1354/7378 [4:39:00<20:28:22, 12.23s/it] + +{'loss': 0.4974, 'learning_rate': 1.879033484830804e-05, 'epoch': 0.18} + + 18%|█▊ | 1354/7378 [4:39:00<20:28:22, 12.23s/it] + 18%|█▊ | 1355/7378 [4:39:12<20:23:59, 12.19s/it] + +{'loss': 0.49, 'learning_rate': 1.878824095232557e-05, 'epoch': 0.18} + + 18%|█▊ | 1355/7378 [4:39:12<20:23:59, 12.19s/it] + 18%|█▊ | 1356/7378 [4:39:24<20:30:00, 12.26s/it] + +{'loss': 0.5474, 'learning_rate': 1.8786145362547387e-05, 'epoch': 0.18} + + 18%|█▊ | 1356/7378 [4:39:24<20:30:00, 12.26s/it] + 18%|█▊ | 1357/7378 [4:39:36<20:30:41, 12.26s/it] + +{'loss': 0.523, 'learning_rate': 1.8784048079377375e-05, 'epoch': 0.18} + + 18%|█▊ | 1357/7378 [4:39:36<20:30:41, 12.26s/it] + 18%|█▊ | 1358/7378 [4:39:49<20:28:11, 12.24s/it] + +{'loss': 0.5169, 'learning_rate': 1.8781949103219758e-05, 'epoch': 0.18} + + 18%|█▊ | 1358/7378 [4:39:49<20:28:11, 12.24s/it] + 18%|█▊ | 1359/7378 [4:40:01<20:34:57, 12.31s/it] + +{'loss': 0.4937, 'learning_rate': 1.8779848434479076e-05, 'epoch': 0.18} + + 18%|█▊ | 1359/7378 [4:40:01<20:34:57, 12.31s/it] + 18%|█▊ | 1360/7378 [4:40:14<20:43:10, 12.39s/it] + +{'loss': 0.425, 'learning_rate': 1.877774607356021e-05, 'epoch': 0.18} + + 18%|█▊ | 1360/7378 [4:40:14<20:43:10, 12.39s/it] + 18%|█▊ | 1361/7378 [4:40:26<20:39:50, 12.36s/it] + +{'loss': 0.4957, 'learning_rate': 1.8775642020868346e-05, 'epoch': 0.18} + + 18%|█▊ | 1361/7378 [4:40:26<20:39:50, 12.36s/it] + 18%|█▊ | 1362/7378 [4:40:38<20:36:23, 12.33s/it] + +{'loss': 0.4726, 'learning_rate': 1.8773536276809016e-05, 'epoch': 0.18} + + 18%|█▊ | 1362/7378 [4:40:38<20:36:23, 12.33s/it] + 18%|█▊ | 1363/7378 [4:40:51<20:33:59, 12.31s/it] + +{'loss': 0.4814, 'learning_rate': 1.877142884178806e-05, 'epoch': 0.18} + + 18%|█▊ | 1363/7378 [4:40:51<20:33:59, 12.31s/it] + 18%|█▊ | 1364/7378 [4:41:03<20:37:03, 12.34s/it] + +{'loss': 0.4866, 'learning_rate': 1.8769319716211658e-05, 'epoch': 0.18} + + 18%|█▊ | 1364/7378 [4:41:03<20:37:03, 12.34s/it] + 19%|█▊ | 1365/7378 [4:41:15<20:27:45, 12.25s/it] + +{'loss': 0.5015, 'learning_rate': 1.8767208900486314e-05, 'epoch': 0.19} + + 19%|█▊ | 1365/7378 [4:41:15<20:27:45, 12.25s/it] + 19%|█▊ | 1366/7378 [4:41:28<20:40:43, 12.38s/it] + +{'loss': 0.5252, 'learning_rate': 1.876509639501885e-05, 'epoch': 0.19} + + 19%|█▊ | 1366/7378 [4:41:28<20:40:43, 12.38s/it] + 19%|█▊ | 1367/7378 [4:41:40<20:35:19, 12.33s/it] + +{'loss': 0.5022, 'learning_rate': 1.8762982200216417e-05, 'epoch': 0.19} + + 19%|█▊ | 1367/7378 [4:41:40<20:35:19, 12.33s/it] + 19%|█▊ | 1368/7378 [4:41:52<20:40:38, 12.39s/it] + +{'loss': 0.4767, 'learning_rate': 1.87608663164865e-05, 'epoch': 0.19} + + 19%|█▊ | 1368/7378 [4:41:52<20:40:38, 12.39s/it] + 19%|█▊ | 1369/7378 [4:42:05<20:41:37, 12.40s/it] + +{'loss': 0.4892, 'learning_rate': 1.8758748744236895e-05, 'epoch': 0.19} + + 19%|█▊ | 1369/7378 [4:42:05<20:41:37, 12.40s/it] + 19%|█▊ | 1370/7378 [4:42:17<20:35:08, 12.34s/it] + +{'loss': 0.4656, 'learning_rate': 1.8756629483875735e-05, 'epoch': 0.19} + + 19%|█▊ | 1370/7378 [4:42:17<20:35:08, 12.34s/it] + 19%|█▊ | 1371/7378 [4:42:29<20:31:41, 12.30s/it] + +{'loss': 0.4516, 'learning_rate': 1.8754508535811477e-05, 'epoch': 0.19} + + 19%|█▊ | 1371/7378 [4:42:29<20:31:41, 12.30s/it] + 19%|█▊ | 1372/7378 [4:42:41<20:21:54, 12.21s/it] + +{'loss': 0.5249, 'learning_rate': 1.8752385900452892e-05, 'epoch': 0.19} + + 19%|█▊ | 1372/7378 [4:42:41<20:21:54, 12.21s/it] + 19%|█▊ | 1373/7378 [4:42:53<20:16:46, 12.16s/it] + +{'loss': 0.5178, 'learning_rate': 1.875026157820909e-05, 'epoch': 0.19} + + 19%|█▊ | 1373/7378 [4:42:53<20:16:46, 12.16s/it] + 19%|█▊ | 1374/7378 [4:43:06<20:22:34, 12.22s/it] + +{'loss': 0.5367, 'learning_rate': 1.8748135569489504e-05, 'epoch': 0.19} + + 19%|█▊ | 1374/7378 [4:43:06<20:22:34, 12.22s/it] + 19%|█▊ | 1375/7378 [4:43:18<20:16:10, 12.16s/it] + +{'loss': 0.3904, 'learning_rate': 1.8746007874703883e-05, 'epoch': 0.19} + + 19%|█▊ | 1375/7378 [4:43:18<20:16:10, 12.16s/it] + 19%|█▊ | 1376/7378 [4:43:30<20:21:31, 12.21s/it] + +{'loss': 0.4929, 'learning_rate': 1.8743878494262304e-05, 'epoch': 0.19} + + 19%|█▊ | 1376/7378 [4:43:30<20:21:31, 12.21s/it] + 19%|█▊ | 1377/7378 [4:43:42<20:19:44, 12.20s/it] + +{'loss': 0.4332, 'learning_rate': 1.8741747428575184e-05, 'epoch': 0.19} + + 19%|█▊ | 1377/7378 [4:43:42<20:19:44, 12.20s/it] + 19%|█▊ | 1378/7378 [4:43:55<20:48:28, 12.48s/it] + +{'loss': 0.589, 'learning_rate': 1.873961467805324e-05, 'epoch': 0.19} + + 19%|█▊ | 1378/7378 [4:43:55<20:48:28, 12.48s/it] + 19%|█▊ | 1379/7378 [4:44:08<20:54:02, 12.54s/it] + +{'loss': 0.4887, 'learning_rate': 1.8737480243107533e-05, 'epoch': 0.19} + + 19%|█▊ | 1379/7378 [4:44:08<20:54:02, 12.54s/it] + 19%|█▊ | 1380/7378 [4:44:20<20:45:43, 12.46s/it] + +{'loss': 0.4737, 'learning_rate': 1.873534412414944e-05, 'epoch': 0.19} + + 19%|█▊ | 1380/7378 [4:44:20<20:45:43, 12.46s/it] + 19%|█▊ | 1381/7378 [4:44:33<20:48:10, 12.49s/it] + +{'loss': 0.5094, 'learning_rate': 1.8733206321590667e-05, 'epoch': 0.19} + + 19%|█▊ | 1381/7378 [4:44:33<20:48:10, 12.49s/it] + 19%|█▊ | 1382/7378 [4:44:45<20:51:09, 12.52s/it] + +{'loss': 0.4369, 'learning_rate': 1.8731066835843237e-05, 'epoch': 0.19} + + 19%|█▊ | 1382/7378 [4:44:45<20:51:09, 12.52s/it] + 19%|█▊ | 1383/7378 [4:44:58<20:52:39, 12.54s/it] + +{'loss': 0.4766, 'learning_rate': 1.8728925667319506e-05, 'epoch': 0.19} + + 19%|█▊ | 1383/7378 [4:44:58<20:52:39, 12.54s/it] + 19%|█▉ | 1384/7378 [4:45:10<20:51:20, 12.53s/it] + +{'loss': 0.5173, 'learning_rate': 1.872678281643215e-05, 'epoch': 0.19} + + 19%|█▉ | 1384/7378 [4:45:10<20:51:20, 12.53s/it] + 19%|█▉ | 1385/7378 [4:45:23<20:46:49, 12.48s/it] + +{'loss': 0.4877, 'learning_rate': 1.872463828359417e-05, 'epoch': 0.19} + + 19%|█▉ | 1385/7378 [4:45:23<20:46:49, 12.48s/it] + 19%|█▉ | 1386/7378 [4:45:35<20:38:36, 12.40s/it] + +{'loss': 0.481, 'learning_rate': 1.8722492069218886e-05, 'epoch': 0.19} + + 19%|█▉ | 1386/7378 [4:45:35<20:38:36, 12.40s/it] + 19%|█▉ | 1387/7378 [4:45:47<20:37:13, 12.39s/it] + +{'loss': 0.3909, 'learning_rate': 1.8720344173719957e-05, 'epoch': 0.19} + + 19%|█▉ | 1387/7378 [4:45:47<20:37:13, 12.39s/it] + 19%|█▉ | 1388/7378 [4:46:00<20:39:39, 12.42s/it] + +{'loss': 0.4597, 'learning_rate': 1.871819459751135e-05, 'epoch': 0.19} + + 19%|█▉ | 1388/7378 [4:46:00<20:39:39, 12.42s/it] + 19%|█▉ | 1389/7378 [4:46:13<20:45:26, 12.48s/it] + +{'loss': 0.4953, 'learning_rate': 1.8716043341007363e-05, 'epoch': 0.19} + + 19%|█▉ | 1389/7378 [4:46:13<20:45:26, 12.48s/it] + 19%|█▉ | 1390/7378 [4:46:25<20:46:36, 12.49s/it] + +{'loss': 0.4899, 'learning_rate': 1.8713890404622618e-05, 'epoch': 0.19} + + 19%|█▉ | 1390/7378 [4:46:25<20:46:36, 12.49s/it] + 19%|█▉ | 1391/7378 [4:46:37<20:36:32, 12.39s/it] + +{'loss': 0.5058, 'learning_rate': 1.8711735788772058e-05, 'epoch': 0.19} + + 19%|█▉ | 1391/7378 [4:46:37<20:36:32, 12.39s/it] + 19%|█▉ | 1392/7378 [4:46:49<20:25:48, 12.29s/it] + +{'loss': 0.5618, 'learning_rate': 1.8709579493870953e-05, 'epoch': 0.19} + + 19%|█▉ | 1392/7378 [4:46:49<20:25:48, 12.29s/it] + 19%|█▉ | 1393/7378 [4:47:02<20:27:56, 12.31s/it] + +{'loss': 0.534, 'learning_rate': 1.8707421520334895e-05, 'epoch': 0.19} + + 19%|█▉ | 1393/7378 [4:47:02<20:27:56, 12.31s/it] + 19%|█▉ | 1394/7378 [4:47:14<20:22:37, 12.26s/it] + +{'loss': 0.5137, 'learning_rate': 1.8705261868579797e-05, 'epoch': 0.19} + + 19%|█▉ | 1394/7378 [4:47:14<20:22:37, 12.26s/it] + 19%|█▉ | 1395/7378 [4:47:26<20:26:40, 12.30s/it] + +{'loss': 0.4886, 'learning_rate': 1.8703100539021902e-05, 'epoch': 0.19} + + 19%|█▉ | 1395/7378 [4:47:26<20:26:40, 12.30s/it] + 19%|█▉ | 1396/7378 [4:47:39<20:36:56, 12.41s/it] + +{'loss': 0.4427, 'learning_rate': 1.870093753207777e-05, 'epoch': 0.19} + + 19%|█▉ | 1396/7378 [4:47:39<20:36:56, 12.41s/it] + 19%|█▉ | 1397/7378 [4:47:51<20:26:52, 12.31s/it] + +{'loss': 0.5062, 'learning_rate': 1.8698772848164286e-05, 'epoch': 0.19} + + 19%|█▉ | 1397/7378 [4:47:51<20:26:52, 12.31s/it] + 19%|█▉ | 1398/7378 [4:48:03<20:25:41, 12.30s/it] + +{'loss': 0.5216, 'learning_rate': 1.869660648769866e-05, 'epoch': 0.19} + + 19%|█▉ | 1398/7378 [4:48:03<20:25:41, 12.30s/it] + 19%|█▉ | 1399/7378 [4:48:15<20:17:55, 12.22s/it] + +{'loss': 0.4616, 'learning_rate': 1.8694438451098423e-05, 'epoch': 0.19} + + 19%|█▉ | 1399/7378 [4:48:15<20:17:55, 12.22s/it] + 19%|█▉ | 1400/7378 [4:48:28<20:28:14, 12.33s/it] + +{'loss': 0.4731, 'learning_rate': 1.8692268738781435e-05, 'epoch': 0.19} + + 19%|█▉ | 1400/7378 [4:48:28<20:28:14, 12.33s/it] + 19%|█▉ | 1401/7378 [4:48:40<20:27:27, 12.32s/it] + +{'loss': 0.5091, 'learning_rate': 1.8690097351165868e-05, 'epoch': 0.19} + + 19%|█▉ | 1401/7378 [4:48:40<20:27:27, 12.32s/it] + 19%|█▉ | 1402/7378 [4:48:52<20:28:29, 12.33s/it] + +{'loss': 0.4852, 'learning_rate': 1.8687924288670224e-05, 'epoch': 0.19} + + 19%|█▉ | 1402/7378 [4:48:52<20:28:29, 12.33s/it] + 19%|█▉ | 1403/7378 [4:49:05<20:38:23, 12.44s/it] + +{'loss': 0.5108, 'learning_rate': 1.8685749551713332e-05, 'epoch': 0.19} + + 19%|█▉ | 1403/7378 [4:49:05<20:38:23, 12.44s/it] + 19%|█▉ | 1404/7378 [4:49:17<20:29:18, 12.35s/it] + +{'loss': 0.4736, 'learning_rate': 1.8683573140714332e-05, 'epoch': 0.19} + + 19%|█▉ | 1404/7378 [4:49:17<20:29:18, 12.35s/it] + 19%|█▉ | 1405/7378 [4:49:29<20:26:26, 12.32s/it] + +{'loss': 0.4681, 'learning_rate': 1.8681395056092694e-05, 'epoch': 0.19} + + 19%|█▉ | 1405/7378 [4:49:30<20:26:26, 12.32s/it] + 19%|█▉ | 1406/7378 [4:49:41<20:16:08, 12.22s/it] + +{'loss': 0.5011, 'learning_rate': 1.867921529826821e-05, 'epoch': 0.19} + + 19%|█▉ | 1406/7378 [4:49:41<20:16:08, 12.22s/it] + 19%|█▉ | 1407/7378 [4:49:54<20:26:14, 12.32s/it] + +{'loss': 0.4398, 'learning_rate': 1.8677033867661e-05, 'epoch': 0.19} + + 19%|█▉ | 1407/7378 [4:49:54<20:26:14, 12.32s/it] + 19%|█▉ | 1408/7378 [4:50:07<20:37:59, 12.44s/it] + +{'loss': 0.5299, 'learning_rate': 1.867485076469149e-05, 'epoch': 0.19} + + 19%|█▉ | 1408/7378 [4:50:07<20:37:59, 12.44s/it] + 19%|█▉ | 1409/7378 [4:50:19<20:37:57, 12.44s/it] + +{'loss': 0.4856, 'learning_rate': 1.8672665989780448e-05, 'epoch': 0.19} + + 19%|█▉ | 1409/7378 [4:50:19<20:37:57, 12.44s/it] + 19%|█▉ | 1410/7378 [4:50:32<20:36:30, 12.43s/it] + +{'loss': 0.5549, 'learning_rate': 1.867047954334895e-05, 'epoch': 0.19} + + 19%|█▉ | 1410/7378 [4:50:32<20:36:30, 12.43s/it] + 19%|█▉ | 1411/7378 [4:50:44<20:41:15, 12.48s/it] + +{'loss': 0.4925, 'learning_rate': 1.8668291425818402e-05, 'epoch': 0.19} + + 19%|█▉ | 1411/7378 [4:50:44<20:41:15, 12.48s/it] + 19%|█▉ | 1412/7378 [4:50:56<20:32:22, 12.39s/it] + +{'loss': 0.5044, 'learning_rate': 1.8666101637610533e-05, 'epoch': 0.19} + + 19%|█▉ | 1412/7378 [4:50:56<20:32:22, 12.39s/it] + 19%|█▉ | 1413/7378 [4:51:09<20:32:06, 12.39s/it] + +{'loss': 0.4645, 'learning_rate': 1.866391017914738e-05, 'epoch': 0.19} + + 19%|█▉ | 1413/7378 [4:51:09<20:32:06, 12.39s/it] + 19%|█▉ | 1414/7378 [4:51:21<20:40:20, 12.48s/it] + +{'loss': 0.5616, 'learning_rate': 1.8661717050851323e-05, 'epoch': 0.19} + + 19%|█▉ | 1414/7378 [4:51:21<20:40:20, 12.48s/it] + 19%|█▉ | 1415/7378 [4:51:34<20:41:11, 12.49s/it] + +{'loss': 0.4656, 'learning_rate': 1.8659522253145042e-05, 'epoch': 0.19} + + 19%|█▉ | 1415/7378 [4:51:34<20:41:11, 12.49s/it] + 19%|█▉ | 1416/7378 [4:51:46<20:30:34, 12.38s/it] + +{'loss': 0.4983, 'learning_rate': 1.8657325786451562e-05, 'epoch': 0.19} + + 19%|█▉ | 1416/7378 [4:51:46<20:30:34, 12.38s/it] + 19%|█▉ | 1417/7378 [4:51:58<20:23:33, 12.32s/it] + +{'loss': 0.4752, 'learning_rate': 1.8655127651194208e-05, 'epoch': 0.19} + + 19%|█▉ | 1417/7378 [4:51:58<20:23:33, 12.32s/it] + 19%|█▉ | 1418/7378 [4:52:11<20:27:10, 12.35s/it] + +{'loss': 0.514, 'learning_rate': 1.8652927847796642e-05, 'epoch': 0.19} + + 19%|█▉ | 1418/7378 [4:52:11<20:27:10, 12.35s/it] + 19%|█▉ | 1419/7378 [4:52:23<20:24:43, 12.33s/it] + +{'loss': 0.4777, 'learning_rate': 1.8650726376682838e-05, 'epoch': 0.19} + + 19%|█▉ | 1419/7378 [4:52:23<20:24:43, 12.33s/it] + 19%|█▉ | 1420/7378 [4:52:35<20:20:28, 12.29s/it] + +{'loss': 0.5303, 'learning_rate': 1.8648523238277096e-05, 'epoch': 0.19} + + 19%|█▉ | 1420/7378 [4:52:35<20:20:28, 12.29s/it] + 19%|█▉ | 1421/7378 [4:52:47<20:12:28, 12.21s/it] + +{'loss': 0.5224, 'learning_rate': 1.864631843300404e-05, 'epoch': 0.19} + + 19%|█▉ | 1421/7378 [4:52:47<20:12:28, 12.21s/it] + 19%|█▉ | 1422/7378 [4:53:00<20:16:40, 12.26s/it] + +{'loss': 0.5262, 'learning_rate': 1.8644111961288605e-05, 'epoch': 0.19} + + 19%|█▉ | 1422/7378 [4:53:00<20:16:40, 12.26s/it] + 19%|█▉ | 1423/7378 [4:53:13<20:37:12, 12.47s/it] + +{'loss': 0.5432, 'learning_rate': 1.8641903823556057e-05, 'epoch': 0.19} + + 19%|█▉ | 1423/7378 [4:53:13<20:37:12, 12.47s/it] + 19%|█▉ | 1424/7378 [4:53:25<20:31:31, 12.41s/it] + +{'loss': 0.4646, 'learning_rate': 1.8639694020231982e-05, 'epoch': 0.19} + + 19%|█▉ | 1424/7378 [4:53:25<20:31:31, 12.41s/it] + 19%|█▉ | 1425/7378 [4:53:38<20:39:48, 12.50s/it] + +{'loss': 0.458, 'learning_rate': 1.8637482551742283e-05, 'epoch': 0.19} + + 19%|█▉ | 1425/7378 [4:53:38<20:39:48, 12.50s/it] + 19%|█▉ | 1426/7378 [4:53:50<20:38:11, 12.48s/it] + +{'loss': 0.4004, 'learning_rate': 1.8635269418513185e-05, 'epoch': 0.19} + + 19%|█▉ | 1426/7378 [4:53:50<20:38:11, 12.48s/it] + 19%|█▉ | 1427/7378 [4:54:02<20:23:25, 12.33s/it] + +{'loss': 0.5064, 'learning_rate': 1.8633054620971238e-05, 'epoch': 0.19} + + 19%|█▉ | 1427/7378 [4:54:02<20:23:25, 12.33s/it] + 19%|█▉ | 1428/7378 [4:54:14<20:11:54, 12.22s/it] + +{'loss': 0.5084, 'learning_rate': 1.8630838159543306e-05, 'epoch': 0.19} + + 19%|█▉ | 1428/7378 [4:54:14<20:11:54, 12.22s/it] + 19%|█▉ | 1429/7378 [4:54:26<20:15:06, 12.26s/it] + +{'loss': 0.5322, 'learning_rate': 1.8628620034656578e-05, 'epoch': 0.19} + + 19%|█▉ | 1429/7378 [4:54:26<20:15:06, 12.26s/it] + 19%|█▉ | 1430/7378 [4:54:38<19:54:52, 12.05s/it] + +{'loss': 0.5112, 'learning_rate': 1.8626400246738568e-05, 'epoch': 0.19} + + 19%|█▉ | 1430/7378 [4:54:38<19:54:52, 12.05s/it] + 19%|█▉ | 1431/7378 [4:54:50<20:07:44, 12.19s/it] + +{'loss': 0.5152, 'learning_rate': 1.8624178796217096e-05, 'epoch': 0.19} + + 19%|█▉ | 1431/7378 [4:54:50<20:07:44, 12.19s/it] + 19%|█▉ | 1432/7378 [4:55:02<20:01:19, 12.12s/it] + +{'loss': 0.5886, 'learning_rate': 1.8621955683520317e-05, 'epoch': 0.19} + + 19%|█▉ | 1432/7378 [4:55:02<20:01:19, 12.12s/it] + 19%|█▉ | 1433/7378 [4:55:14<19:54:58, 12.06s/it] + +{'loss': 0.4784, 'learning_rate': 1.8619730909076704e-05, 'epoch': 0.19} + + 19%|█▉ | 1433/7378 [4:55:14<19:54:58, 12.06s/it] + 19%|█▉ | 1434/7378 [4:55:27<20:01:54, 12.13s/it] + +{'loss': 0.47, 'learning_rate': 1.861750447331504e-05, 'epoch': 0.19} + + 19%|█▉ | 1434/7378 [4:55:27<20:01:54, 12.13s/it] + 19%|█▉ | 1435/7378 [4:55:39<20:06:29, 12.18s/it] + +{'loss': 0.4503, 'learning_rate': 1.861527637666444e-05, 'epoch': 0.19} + + 19%|█▉ | 1435/7378 [4:55:39<20:06:29, 12.18s/it] + 19%|█▉ | 1436/7378 [4:55:51<20:02:50, 12.15s/it] + +{'loss': 0.4899, 'learning_rate': 1.8613046619554337e-05, 'epoch': 0.19} + + 19%|█▉ | 1436/7378 [4:55:51<20:02:50, 12.15s/it] + 19%|█▉ | 1437/7378 [4:56:03<20:03:40, 12.16s/it] + +{'loss': 0.5117, 'learning_rate': 1.8610815202414477e-05, 'epoch': 0.19} + + 19%|█▉ | 1437/7378 [4:56:03<20:03:40, 12.16s/it] + 19%|█▉ | 1438/7378 [4:56:15<20:02:24, 12.15s/it] + +{'loss': 0.558, 'learning_rate': 1.8608582125674933e-05, 'epoch': 0.19} + + 19%|█▉ | 1438/7378 [4:56:15<20:02:24, 12.15s/it] + 20%|█▉ | 1439/7378 [4:56:28<20:09:09, 12.22s/it] + +{'loss': 0.5254, 'learning_rate': 1.8606347389766094e-05, 'epoch': 0.2} + + 20%|█▉ | 1439/7378 [4:56:28<20:09:09, 12.22s/it] + 20%|█▉ | 1440/7378 [4:56:40<20:15:04, 12.28s/it] + +{'loss': 0.5612, 'learning_rate': 1.8604110995118675e-05, 'epoch': 0.2} + + 20%|█▉ | 1440/7378 [4:56:40<20:15:04, 12.28s/it] + 20%|█▉ | 1441/7378 [4:56:52<20:12:12, 12.25s/it] + +{'loss': 0.5071, 'learning_rate': 1.86018729421637e-05, 'epoch': 0.2} + + 20%|█▉ | 1441/7378 [4:56:52<20:12:12, 12.25s/it] + 20%|█▉ | 1442/7378 [4:57:05<20:33:24, 12.47s/it] + +{'loss': 0.5871, 'learning_rate': 1.8599633231332522e-05, 'epoch': 0.2} + + 20%|█▉ | 1442/7378 [4:57:05<20:33:24, 12.47s/it] + 20%|█▉ | 1443/7378 [4:57:18<20:36:05, 12.50s/it] + +{'loss': 0.48, 'learning_rate': 1.859739186305681e-05, 'epoch': 0.2} + + 20%|█▉ | 1443/7378 [4:57:18<20:36:05, 12.50s/it] + 20%|█▉ | 1444/7378 [4:57:31<20:53:24, 12.67s/it] + +{'loss': 0.5239, 'learning_rate': 1.8595148837768554e-05, 'epoch': 0.2} + + 20%|█▉ | 1444/7378 [4:57:31<20:53:24, 12.67s/it] + 20%|█▉ | 1445/7378 [4:57:43<20:39:03, 12.53s/it] + +{'loss': 0.5375, 'learning_rate': 1.8592904155900057e-05, 'epoch': 0.2} + + 20%|█▉ | 1445/7378 [4:57:43<20:39:03, 12.53s/it] + 20%|█▉ | 1446/7378 [4:57:55<20:29:05, 12.43s/it] + +{'loss': 0.469, 'learning_rate': 1.8590657817883952e-05, 'epoch': 0.2} + + 20%|█▉ | 1446/7378 [4:57:55<20:29:05, 12.43s/it] + 20%|█▉ | 1447/7378 [4:58:07<20:24:38, 12.39s/it] + +{'loss': 0.5096, 'learning_rate': 1.858840982415318e-05, 'epoch': 0.2} + + 20%|█▉ | 1447/7378 [4:58:07<20:24:38, 12.39s/it] + 20%|█▉ | 1448/7378 [4:58:20<20:29:27, 12.44s/it] + +{'loss': 0.53, 'learning_rate': 1.858616017514101e-05, 'epoch': 0.2} + + 20%|█▉ | 1448/7378 [4:58:20<20:29:27, 12.44s/it] + 20%|█▉ | 1449/7378 [4:58:32<20:11:05, 12.26s/it] + +{'loss': 0.4787, 'learning_rate': 1.8583908871281026e-05, 'epoch': 0.2} + + 20%|█▉ | 1449/7378 [4:58:32<20:11:05, 12.26s/it] + 20%|█▉ | 1450/7378 [4:58:44<20:17:03, 12.32s/it] + +{'loss': 0.4737, 'learning_rate': 1.8581655913007136e-05, 'epoch': 0.2} + + 20%|█▉ | 1450/7378 [4:58:44<20:17:03, 12.32s/it] + 20%|█▉ | 1451/7378 [4:58:56<20:11:12, 12.26s/it] + +{'loss': 0.4753, 'learning_rate': 1.8579401300753552e-05, 'epoch': 0.2} + + 20%|█▉ | 1451/7378 [4:58:56<20:11:12, 12.26s/it] + 20%|█▉ | 1452/7378 [4:59:09<20:09:03, 12.24s/it] + +{'loss': 0.3974, 'learning_rate': 1.8577145034954822e-05, 'epoch': 0.2} + + 20%|█▉ | 1452/7378 [4:59:09<20:09:03, 12.24s/it] + 20%|█▉ | 1453/7378 [4:59:21<20:06:26, 12.22s/it] + +{'loss': 0.4623, 'learning_rate': 1.857488711604581e-05, 'epoch': 0.2} + + 20%|█▉ | 1453/7378 [4:59:21<20:06:26, 12.22s/it] + 20%|█▉ | 1454/7378 [4:59:33<20:06:08, 12.22s/it] + +{'loss': 0.5074, 'learning_rate': 1.8572627544461682e-05, 'epoch': 0.2} + + 20%|█▉ | 1454/7378 [4:59:33<20:06:08, 12.22s/it] + 20%|█▉ | 1455/7378 [4:59:45<20:10:38, 12.26s/it] + +{'loss': 0.5923, 'learning_rate': 1.8570366320637947e-05, 'epoch': 0.2} + + 20%|█▉ | 1455/7378 [4:59:45<20:10:38, 12.26s/it] + 20%|█▉ | 1456/7378 [4:59:58<20:08:18, 12.24s/it] + +{'loss': 0.4893, 'learning_rate': 1.8568103445010413e-05, 'epoch': 0.2} + + 20%|█▉ | 1456/7378 [4:59:58<20:08:18, 12.24s/it] + 20%|█▉ | 1457/7378 [5:00:10<20:02:12, 12.18s/it] + +{'loss': 0.5064, 'learning_rate': 1.8565838918015218e-05, 'epoch': 0.2} + + 20%|█▉ | 1457/7378 [5:00:10<20:02:12, 12.18s/it] + 20%|█▉ | 1458/7378 [5:00:22<20:11:01, 12.27s/it] + +{'loss': 0.4962, 'learning_rate': 1.8563572740088813e-05, 'epoch': 0.2} + + 20%|█▉ | 1458/7378 [5:00:22<20:11:01, 12.27s/it] + 20%|█▉ | 1459/7378 [5:00:34<20:11:33, 12.28s/it] + +{'loss': 0.4952, 'learning_rate': 1.8561304911667967e-05, 'epoch': 0.2} + + 20%|█▉ | 1459/7378 [5:00:34<20:11:33, 12.28s/it] + 20%|█▉ | 1460/7378 [5:00:47<20:09:10, 12.26s/it] + +{'loss': 0.4964, 'learning_rate': 1.855903543318977e-05, 'epoch': 0.2} + + 20%|█▉ | 1460/7378 [5:00:47<20:09:10, 12.26s/it] + 20%|█▉ | 1461/7378 [5:00:59<20:10:32, 12.28s/it] + +{'loss': 0.5014, 'learning_rate': 1.855676430509163e-05, 'epoch': 0.2} + + 20%|█▉ | 1461/7378 [5:00:59<20:10:32, 12.28s/it] + 20%|█▉ | 1462/7378 [5:01:11<20:09:39, 12.27s/it] + +{'loss': 0.4944, 'learning_rate': 1.8554491527811266e-05, 'epoch': 0.2} + + 20%|█▉ | 1462/7378 [5:01:11<20:09:39, 12.27s/it] + 20%|█▉ | 1463/7378 [5:01:23<20:04:37, 12.22s/it] + +{'loss': 0.4667, 'learning_rate': 1.8552217101786728e-05, 'epoch': 0.2} + + 20%|█▉ | 1463/7378 [5:01:23<20:04:37, 12.22s/it] + 20%|█▉ | 1464/7378 [5:01:36<20:13:42, 12.31s/it] + +{'loss': 0.5277, 'learning_rate': 1.8549941027456365e-05, 'epoch': 0.2} + + 20%|█▉ | 1464/7378 [5:01:36<20:13:42, 12.31s/it] + 20%|█▉ | 1465/7378 [5:01:49<20:30:28, 12.49s/it] + +{'loss': 0.5659, 'learning_rate': 1.8547663305258864e-05, 'epoch': 0.2} + + 20%|█▉ | 1465/7378 [5:01:49<20:30:28, 12.49s/it] + 20%|█▉ | 1466/7378 [5:02:01<20:25:13, 12.43s/it] + +{'loss': 0.5423, 'learning_rate': 1.8545383935633217e-05, 'epoch': 0.2} + + 20%|█▉ | 1466/7378 [5:02:01<20:25:13, 12.43s/it] + 20%|█▉ | 1467/7378 [5:02:13<20:06:33, 12.25s/it] + +{'loss': 0.4869, 'learning_rate': 1.8543102919018738e-05, 'epoch': 0.2} + + 20%|█▉ | 1467/7378 [5:02:13<20:06:33, 12.25s/it] + 20%|█▉ | 1468/7378 [5:02:25<20:17:45, 12.36s/it] + +{'loss': 0.5409, 'learning_rate': 1.854082025585506e-05, 'epoch': 0.2} + + 20%|█▉ | 1468/7378 [5:02:25<20:17:45, 12.36s/it] + 20%|█▉ | 1469/7378 [5:02:38<20:12:54, 12.32s/it] + +{'loss': 0.5287, 'learning_rate': 1.8538535946582122e-05, 'epoch': 0.2} + + 20%|█▉ | 1469/7378 [5:02:38<20:12:54, 12.32s/it] + 20%|█▉ | 1470/7378 [5:02:50<20:14:43, 12.34s/it] + +{'loss': 0.4466, 'learning_rate': 1.8536249991640192e-05, 'epoch': 0.2} + + 20%|█▉ | 1470/7378 [5:02:50<20:14:43, 12.34s/it] + 20%|█▉ | 1471/7378 [5:03:03<20:23:19, 12.43s/it] + +{'loss': 0.4696, 'learning_rate': 1.8533962391469855e-05, 'epoch': 0.2} + + 20%|█▉ | 1471/7378 [5:03:03<20:23:19, 12.43s/it] + 20%|█▉ | 1472/7378 [5:03:15<20:28:04, 12.48s/it] + +{'loss': 0.4724, 'learning_rate': 1.853167314651201e-05, 'epoch': 0.2} + + 20%|█▉ | 1472/7378 [5:03:15<20:28:04, 12.48s/it] + 20%|█▉ | 1473/7378 [5:03:28<20:25:30, 12.45s/it] + +{'loss': 0.4722, 'learning_rate': 1.852938225720787e-05, 'epoch': 0.2} + + 20%|█▉ | 1473/7378 [5:03:28<20:25:30, 12.45s/it] + 20%|█▉ | 1474/7378 [5:03:40<20:24:23, 12.44s/it] + +{'loss': 0.5365, 'learning_rate': 1.8527089723998973e-05, 'epoch': 0.2} + + 20%|█▉ | 1474/7378 [5:03:40<20:24:23, 12.44s/it] + 20%|█▉ | 1475/7378 [5:03:53<20:22:47, 12.43s/it] + +{'loss': 0.5354, 'learning_rate': 1.8524795547327163e-05, 'epoch': 0.2} + + 20%|█▉ | 1475/7378 [5:03:53<20:22:47, 12.43s/it] + 20%|██ | 1476/7378 [5:04:05<20:16:55, 12.37s/it] + +{'loss': 0.5011, 'learning_rate': 1.8522499727634612e-05, 'epoch': 0.2} + + 20%|██ | 1476/7378 [5:04:05<20:16:55, 12.37s/it] + 20%|██ | 1477/7378 [5:04:17<20:15:08, 12.36s/it] + +{'loss': 0.4897, 'learning_rate': 1.85202022653638e-05, 'epoch': 0.2} + + 20%|██ | 1477/7378 [5:04:17<20:15:08, 12.36s/it] + 20%|██ | 1478/7378 [5:04:29<20:07:26, 12.28s/it] + +{'loss': 0.4272, 'learning_rate': 1.8517903160957523e-05, 'epoch': 0.2} + + 20%|██ | 1478/7378 [5:04:29<20:07:26, 12.28s/it] + 20%|██ | 1479/7378 [5:04:41<20:03:12, 12.24s/it] + +{'loss': 0.5337, 'learning_rate': 1.8515602414858907e-05, 'epoch': 0.2} + + 20%|██ | 1479/7378 [5:04:41<20:03:12, 12.24s/it] + 20%|██ | 1480/7378 [5:04:53<20:01:38, 12.22s/it] + +{'loss': 0.4739, 'learning_rate': 1.8513300027511377e-05, 'epoch': 0.2} + + 20%|██ | 1480/7378 [5:04:54<20:01:38, 12.22s/it] + 20%|██ | 1481/7378 [5:05:06<20:04:52, 12.26s/it] + +{'loss': 0.5407, 'learning_rate': 1.8510995999358683e-05, 'epoch': 0.2} + + 20%|██ | 1481/7378 [5:05:06<20:04:52, 12.26s/it] + 20%|██ | 1482/7378 [5:05:18<20:07:31, 12.29s/it] + +{'loss': 0.5002, 'learning_rate': 1.8508690330844893e-05, 'epoch': 0.2} + + 20%|██ | 1482/7378 [5:05:18<20:07:31, 12.29s/it] + 20%|██ | 1483/7378 [5:05:30<20:06:33, 12.28s/it] + +{'loss': 0.548, 'learning_rate': 1.850638302241439e-05, 'epoch': 0.2} + + 20%|██ | 1483/7378 [5:05:30<20:06:33, 12.28s/it] + 20%|██ | 1484/7378 [5:05:43<20:07:49, 12.30s/it] + +{'loss': 0.4593, 'learning_rate': 1.8504074074511866e-05, 'epoch': 0.2} + + 20%|██ | 1484/7378 [5:05:43<20:07:49, 12.30s/it] + 20%|██ | 1485/7378 [5:05:55<20:02:36, 12.24s/it] + +{'loss': 0.5243, 'learning_rate': 1.8501763487582338e-05, 'epoch': 0.2} + + 20%|██ | 1485/7378 [5:05:55<20:02:36, 12.24s/it] + 20%|██ | 1486/7378 [5:06:07<19:52:32, 12.14s/it] + +{'loss': 0.5442, 'learning_rate': 1.8499451262071134e-05, 'epoch': 0.2} + + 20%|██ | 1486/7378 [5:06:07<19:52:32, 12.14s/it] + 20%|██ | 1487/7378 [5:06:19<19:54:45, 12.17s/it] + +{'loss': 0.5131, 'learning_rate': 1.8497137398423903e-05, 'epoch': 0.2} + + 20%|██ | 1487/7378 [5:06:19<19:54:45, 12.17s/it] + 20%|██ | 1488/7378 [5:06:31<19:51:41, 12.14s/it] + +{'loss': 0.5262, 'learning_rate': 1.8494821897086603e-05, 'epoch': 0.2} + + 20%|██ | 1488/7378 [5:06:31<19:51:41, 12.14s/it] + 20%|██ | 1489/7378 [5:06:43<19:53:28, 12.16s/it] + +{'loss': 0.4984, 'learning_rate': 1.8492504758505506e-05, 'epoch': 0.2} + + 20%|██ | 1489/7378 [5:06:43<19:53:28, 12.16s/it] + 20%|██ | 1490/7378 [5:06:55<19:51:08, 12.14s/it] + +{'loss': 0.5761, 'learning_rate': 1.8490185983127212e-05, 'epoch': 0.2} + + 20%|██ | 1490/7378 [5:06:55<19:51:08, 12.14s/it] + 20%|██ | 1491/7378 [5:07:08<20:04:17, 12.27s/it] + +{'loss': 0.492, 'learning_rate': 1.8487865571398625e-05, 'epoch': 0.2} + + 20%|██ | 1491/7378 [5:07:08<20:04:17, 12.27s/it] + 20%|██ | 1492/7378 [5:07:20<19:49:31, 12.13s/it] + +{'loss': 0.4646, 'learning_rate': 1.8485543523766965e-05, 'epoch': 0.2} + + 20%|██ | 1492/7378 [5:07:20<19:49:31, 12.13s/it] + 20%|██ | 1493/7378 [5:07:32<19:45:44, 12.09s/it] + +{'loss': 0.4848, 'learning_rate': 1.8483219840679778e-05, 'epoch': 0.2} + + 20%|██ | 1493/7378 [5:07:32<19:45:44, 12.09s/it] + 20%|██ | 1494/7378 [5:07:44<19:44:48, 12.08s/it] + +{'loss': 0.4874, 'learning_rate': 1.848089452258491e-05, 'epoch': 0.2} + + 20%|██ | 1494/7378 [5:07:44<19:44:48, 12.08s/it] + 20%|██ | 1495/7378 [5:07:56<19:49:49, 12.13s/it] + +{'loss': 0.4343, 'learning_rate': 1.8478567569930536e-05, 'epoch': 0.2} + + 20%|██ | 1495/7378 [5:07:56<19:49:49, 12.13s/it] + 20%|██ | 1496/7378 [5:08:08<19:51:36, 12.16s/it] + +{'loss': 0.4738, 'learning_rate': 1.8476238983165137e-05, 'epoch': 0.2} + + 20%|██ | 1496/7378 [5:08:08<19:51:36, 12.16s/it] + 20%|██ | 1497/7378 [5:08:20<19:47:33, 12.12s/it] + +{'loss': 0.5447, 'learning_rate': 1.847390876273751e-05, 'epoch': 0.2} + + 20%|██ | 1497/7378 [5:08:20<19:47:33, 12.12s/it] + 20%|██ | 1498/7378 [5:08:33<20:04:01, 12.29s/it] + +{'loss': 0.4244, 'learning_rate': 1.8471576909096768e-05, 'epoch': 0.2} + + 20%|██ | 1498/7378 [5:08:33<20:04:01, 12.29s/it] + 20%|██ | 1499/7378 [5:08:46<20:20:32, 12.46s/it] + +{'loss': 0.5189, 'learning_rate': 1.846924342269234e-05, 'epoch': 0.2} + + 20%|██ | 1499/7378 [5:08:46<20:20:32, 12.46s/it] + 20%|██ | 1500/7378 [5:08:58<20:18:59, 12.44s/it] + +{'loss': 0.4267, 'learning_rate': 1.8466908303973968e-05, 'epoch': 0.2} + + 20%|██ | 1500/7378 [5:08:58<20:18:59, 12.44s/it] + 20%|██ | 1501/7378 [5:09:11<20:23:12, 12.49s/it] + +{'loss': 0.4688, 'learning_rate': 1.8464571553391717e-05, 'epoch': 0.2} + + 20%|██ | 1501/7378 [5:09:11<20:23:12, 12.49s/it] + 20%|██ | 1502/7378 [5:09:23<20:16:09, 12.42s/it] + +{'loss': 0.5361, 'learning_rate': 1.846223317139595e-05, 'epoch': 0.2} + + 20%|██ | 1502/7378 [5:09:23<20:16:09, 12.42s/it] + 20%|██ | 1503/7378 [5:09:35<20:11:40, 12.37s/it] + +{'loss': 0.5404, 'learning_rate': 1.8459893158437357e-05, 'epoch': 0.2} + + 20%|██ | 1503/7378 [5:09:35<20:11:40, 12.37s/it] + 20%|██ | 1504/7378 [5:09:47<19:58:59, 12.25s/it] + +{'loss': 0.4925, 'learning_rate': 1.845755151496694e-05, 'epoch': 0.2} + + 20%|██ | 1504/7378 [5:09:47<19:58:59, 12.25s/it] + 20%|██ | 1505/7378 [5:10:00<20:06:20, 12.32s/it] + +{'loss': 0.4951, 'learning_rate': 1.8455208241436012e-05, 'epoch': 0.2} + + 20%|██ | 1505/7378 [5:10:00<20:06:20, 12.32s/it] + 20%|██ | 1506/7378 [5:10:12<20:03:28, 12.30s/it] + +{'loss': 0.4605, 'learning_rate': 1.84528633382962e-05, 'epoch': 0.2} + + 20%|██ | 1506/7378 [5:10:12<20:03:28, 12.30s/it] + 20%|██ | 1507/7378 [5:10:24<20:05:09, 12.32s/it] + +{'loss': 0.5868, 'learning_rate': 1.8450516805999452e-05, 'epoch': 0.2} + + 20%|██ | 1507/7378 [5:10:24<20:05:09, 12.32s/it] + 20%|██ | 1508/7378 [5:10:37<19:58:05, 12.25s/it] + +{'loss': 0.4189, 'learning_rate': 1.8448168644998025e-05, 'epoch': 0.2} + + 20%|██ | 1508/7378 [5:10:37<19:58:05, 12.25s/it] + 20%|██ | 1509/7378 [5:10:49<20:14:21, 12.41s/it] + +{'loss': 0.4829, 'learning_rate': 1.844581885574449e-05, 'epoch': 0.2} + + 20%|██ | 1509/7378 [5:10:49<20:14:21, 12.41s/it] + 20%|██ | 1510/7378 [5:11:01<20:03:32, 12.31s/it] + +{'loss': 0.5216, 'learning_rate': 1.844346743869173e-05, 'epoch': 0.2} + + 20%|██ | 1510/7378 [5:11:01<20:03:32, 12.31s/it] + 20%|██ | 1511/7378 [5:11:14<20:21:34, 12.49s/it] + +{'loss': 0.529, 'learning_rate': 1.8441114394292943e-05, 'epoch': 0.2} + + 20%|██ | 1511/7378 [5:11:14<20:21:34, 12.49s/it] + 20%|██ | 1512/7378 [5:11:26<20:06:18, 12.34s/it] + +{'loss': 0.4417, 'learning_rate': 1.8438759723001643e-05, 'epoch': 0.2} + + 20%|██ | 1512/7378 [5:11:26<20:06:18, 12.34s/it] + 21%|██ | 1513/7378 [5:11:39<20:02:37, 12.30s/it] + +{'loss': 0.4878, 'learning_rate': 1.8436403425271655e-05, 'epoch': 0.21} + + 21%|██ | 1513/7378 [5:11:39<20:02:37, 12.30s/it] + 21%|██ | 1514/7378 [5:11:51<20:01:52, 12.30s/it] + +{'loss': 0.5353, 'learning_rate': 1.8434045501557122e-05, 'epoch': 0.21} + + 21%|██ | 1514/7378 [5:11:51<20:01:52, 12.30s/it] + 21%|██ | 1515/7378 [5:12:06<21:27:02, 13.17s/it] + +{'loss': 0.4611, 'learning_rate': 1.8431685952312492e-05, 'epoch': 0.21} + + 21%|██ | 1515/7378 [5:12:06<21:27:02, 13.17s/it] + 21%|██ | 1516/7378 [5:12:19<21:12:30, 13.02s/it] + +{'loss': 0.5061, 'learning_rate': 1.8429324777992534e-05, 'epoch': 0.21} + + 21%|██ | 1516/7378 [5:12:19<21:12:30, 13.02s/it] + 21%|██ | 1517/7378 [5:12:31<20:56:55, 12.87s/it] + +{'loss': 0.4953, 'learning_rate': 1.842696197905233e-05, 'epoch': 0.21} + + 21%|██ | 1517/7378 [5:12:31<20:56:55, 12.87s/it] + 21%|██ | 1518/7378 [5:12:44<20:46:42, 12.77s/it] + +{'loss': 0.4779, 'learning_rate': 1.8424597555947268e-05, 'epoch': 0.21} + + 21%|██ | 1518/7378 [5:12:44<20:46:42, 12.77s/it] + 21%|██ | 1519/7378 [5:12:56<20:38:18, 12.68s/it] + +{'loss': 0.489, 'learning_rate': 1.8422231509133052e-05, 'epoch': 0.21} + + 21%|██ | 1519/7378 [5:12:56<20:38:18, 12.68s/it] + 21%|██ | 1520/7378 [5:13:08<20:21:38, 12.51s/it] + +{'loss': 0.4381, 'learning_rate': 1.8419863839065706e-05, 'epoch': 0.21} + + 21%|██ | 1520/7378 [5:13:08<20:21:38, 12.51s/it] + 21%|██ | 1521/7378 [5:13:20<20:09:48, 12.39s/it] + +{'loss': 0.4791, 'learning_rate': 1.8417494546201557e-05, 'epoch': 0.21} + + 21%|██ | 1521/7378 [5:13:20<20:09:48, 12.39s/it] + 21%|██ | 1522/7378 [5:13:33<20:02:43, 12.32s/it] + +{'loss': 0.463, 'learning_rate': 1.8415123630997254e-05, 'epoch': 0.21} + + 21%|██ | 1522/7378 [5:13:33<20:02:43, 12.32s/it] + 21%|██ | 1523/7378 [5:13:45<20:17:36, 12.48s/it] + +{'loss': 0.5187, 'learning_rate': 1.8412751093909747e-05, 'epoch': 0.21} + + 21%|██ | 1523/7378 [5:13:45<20:17:36, 12.48s/it] + 21%|██ | 1524/7378 [5:14:01<21:57:28, 13.50s/it] + +{'loss': 0.5039, 'learning_rate': 1.841037693539631e-05, 'epoch': 0.21} + + 21%|██ | 1524/7378 [5:14:01<21:57:28, 13.50s/it] + 21%|██ | 1525/7378 [5:14:14<21:31:24, 13.24s/it] + +{'loss': 0.5017, 'learning_rate': 1.840800115591452e-05, 'epoch': 0.21} + + 21%|██ | 1525/7378 [5:14:14<21:31:24, 13.24s/it] + 21%|██ | 1526/7378 [5:14:29<22:18:37, 13.72s/it] + +{'loss': 0.4928, 'learning_rate': 1.840562375592228e-05, 'epoch': 0.21} + + 21%|██ | 1526/7378 [5:14:29<22:18:37, 13.72s/it] + 21%|██ | 1527/7378 [5:14:42<21:51:02, 13.44s/it] + +{'loss': 0.4947, 'learning_rate': 1.8403244735877787e-05, 'epoch': 0.21} + + 21%|██ | 1527/7378 [5:14:42<21:51:02, 13.44s/it] + 21%|██ | 1528/7378 [5:14:54<21:08:12, 13.01s/it] + +{'loss': 0.5041, 'learning_rate': 1.840086409623957e-05, 'epoch': 0.21} + + 21%|██ | 1528/7378 [5:14:54<21:08:12, 13.01s/it] + 21%|██ | 1529/7378 [5:15:10<22:34:35, 13.90s/it] + +{'loss': 0.4542, 'learning_rate': 1.839848183746645e-05, 'epoch': 0.21} + + 21%|██ | 1529/7378 [5:15:10<22:34:35, 13.90s/it] + 21%|██ | 1530/7378 [5:15:22<22:01:00, 13.55s/it] + +{'loss': 0.5358, 'learning_rate': 1.8396097960017574e-05, 'epoch': 0.21} + + 21%|██ | 1530/7378 [5:15:22<22:01:00, 13.55s/it] + 21%|██ | 1531/7378 [5:15:35<21:23:18, 13.17s/it] + +{'loss': 0.4978, 'learning_rate': 1.83937124643524e-05, 'epoch': 0.21} + + 21%|██ | 1531/7378 [5:15:35<21:23:18, 13.17s/it] + 21%|██ | 1532/7378 [5:15:47<20:57:38, 12.91s/it] + +{'loss': 0.4184, 'learning_rate': 1.839132535093069e-05, 'epoch': 0.21} + + 21%|██ | 1532/7378 [5:15:47<20:57:38, 12.91s/it] + 21%|██ | 1533/7378 [5:15:59<20:34:08, 12.67s/it] + +{'loss': 0.4225, 'learning_rate': 1.8388936620212524e-05, 'epoch': 0.21} + + 21%|██ | 1533/7378 [5:15:59<20:34:08, 12.67s/it] + 21%|██ | 1534/7378 [5:16:11<20:14:40, 12.47s/it] + +{'loss': 0.458, 'learning_rate': 1.8386546272658296e-05, 'epoch': 0.21} + + 21%|██ | 1534/7378 [5:16:11<20:14:40, 12.47s/it] + 21%|██ | 1535/7378 [5:16:23<20:10:36, 12.43s/it] + +{'loss': 0.4688, 'learning_rate': 1.8384154308728703e-05, 'epoch': 0.21} + + 21%|██ | 1535/7378 [5:16:23<20:10:36, 12.43s/it] + 21%|██ | 1536/7378 [5:16:36<20:07:47, 12.40s/it] + +{'loss': 0.476, 'learning_rate': 1.8381760728884765e-05, 'epoch': 0.21} + + 21%|██ | 1536/7378 [5:16:36<20:07:47, 12.40s/it] + 21%|██ | 1537/7378 [5:16:48<20:04:01, 12.37s/it] + +{'loss': 0.5169, 'learning_rate': 1.83793655335878e-05, 'epoch': 0.21} + + 21%|██ | 1537/7378 [5:16:48<20:04:01, 12.37s/it] + 21%|██ | 1538/7378 [5:17:00<20:07:06, 12.40s/it] + +{'loss': 0.4705, 'learning_rate': 1.8376968723299444e-05, 'epoch': 0.21} + + 21%|██ | 1538/7378 [5:17:00<20:07:06, 12.40s/it] + 21%|██ | 1539/7378 [5:17:12<19:51:28, 12.24s/it] + +{'loss': 0.4996, 'learning_rate': 1.837457029848165e-05, 'epoch': 0.21} + + 21%|██ | 1539/7378 [5:17:12<19:51:28, 12.24s/it] + 21%|██ | 1540/7378 [5:17:25<20:06:10, 12.40s/it] + +{'loss': 0.5383, 'learning_rate': 1.8372170259596677e-05, 'epoch': 0.21} + + 21%|██ | 1540/7378 [5:17:25<20:06:10, 12.40s/it] + 21%|██ | 1541/7378 [5:17:37<19:57:11, 12.31s/it] + +{'loss': 0.5228, 'learning_rate': 1.836976860710709e-05, 'epoch': 0.21} + + 21%|██ | 1541/7378 [5:17:37<19:57:11, 12.31s/it] + 21%|██ | 1542/7378 [5:17:50<19:58:00, 12.32s/it] + +{'loss': 0.4322, 'learning_rate': 1.8367365341475777e-05, 'epoch': 0.21} + + 21%|██ | 1542/7378 [5:17:50<19:58:00, 12.32s/it] + 21%|██ | 1543/7378 [5:18:01<19:38:45, 12.12s/it] + +{'loss': 0.4855, 'learning_rate': 1.8364960463165918e-05, 'epoch': 0.21} + + 21%|██ | 1543/7378 [5:18:01<19:38:45, 12.12s/it] + 21%|██ | 1544/7378 [5:18:14<19:46:26, 12.20s/it] + +{'loss': 0.5187, 'learning_rate': 1.836255397264103e-05, 'epoch': 0.21} + + 21%|██ | 1544/7378 [5:18:14<19:46:26, 12.20s/it] + 21%|██ | 1545/7378 [5:18:26<19:51:44, 12.26s/it] + +{'loss': 0.4844, 'learning_rate': 1.8360145870364917e-05, 'epoch': 0.21} + + 21%|██ | 1545/7378 [5:18:26<19:51:44, 12.26s/it] + 21%|██ | 1546/7378 [5:18:38<19:53:56, 12.28s/it] + +{'loss': 0.486, 'learning_rate': 1.8357736156801703e-05, 'epoch': 0.21} + + 21%|██ | 1546/7378 [5:18:38<19:53:56, 12.28s/it] + 21%|██ | 1547/7378 [5:18:51<19:58:43, 12.33s/it] + +{'loss': 0.5155, 'learning_rate': 1.8355324832415828e-05, 'epoch': 0.21} + + 21%|██ | 1547/7378 [5:18:51<19:58:43, 12.33s/it] + 21%|██ | 1548/7378 [5:19:03<19:57:13, 12.32s/it] + +{'loss': 0.5611, 'learning_rate': 1.8352911897672028e-05, 'epoch': 0.21} + + 21%|██ | 1548/7378 [5:19:03<19:57:13, 12.32s/it] + 21%|██ | 1549/7378 [5:19:16<20:06:08, 12.42s/it] + +{'loss': 0.4575, 'learning_rate': 1.835049735303537e-05, 'epoch': 0.21} + + 21%|██ | 1549/7378 [5:19:16<20:06:08, 12.42s/it] + 21%|██ | 1550/7378 [5:19:28<20:05:27, 12.41s/it] + +{'loss': 0.5189, 'learning_rate': 1.834808119897121e-05, 'epoch': 0.21} + + 21%|██ | 1550/7378 [5:19:28<20:05:27, 12.41s/it] + 21%|██ | 1551/7378 [5:19:40<20:05:01, 12.41s/it] + +{'loss': 0.48, 'learning_rate': 1.834566343594523e-05, 'epoch': 0.21} + + 21%|██ | 1551/7378 [5:19:40<20:05:01, 12.41s/it] + 21%|██ | 1552/7378 [5:19:53<20:09:21, 12.45s/it] + +{'loss': 0.4835, 'learning_rate': 1.834324406442341e-05, 'epoch': 0.21} + + 21%|██ | 1552/7378 [5:19:53<20:09:21, 12.45s/it] + 21%|██ | 1553/7378 [5:20:05<19:58:33, 12.35s/it] + +{'loss': 0.4947, 'learning_rate': 1.8340823084872053e-05, 'epoch': 0.21} + + 21%|██ | 1553/7378 [5:20:05<19:58:33, 12.35s/it] + 21%|██ | 1554/7378 [5:20:17<19:52:22, 12.28s/it] + +{'loss': 0.416, 'learning_rate': 1.8338400497757757e-05, 'epoch': 0.21} + + 21%|██ | 1554/7378 [5:20:17<19:52:22, 12.28s/it] + 21%|██ | 1555/7378 [5:20:30<19:53:49, 12.30s/it] + +{'loss': 0.5052, 'learning_rate': 1.8335976303547446e-05, 'epoch': 0.21} + + 21%|██ | 1555/7378 [5:20:30<19:53:49, 12.30s/it] + 21%|██ | 1556/7378 [5:20:42<20:02:32, 12.39s/it] + +{'loss': 0.4747, 'learning_rate': 1.833355050270834e-05, 'epoch': 0.21} + + 21%|██ | 1556/7378 [5:20:42<20:02:32, 12.39s/it] + 21%|██ | 1557/7378 [5:20:55<20:00:17, 12.37s/it] + +{'loss': 0.5074, 'learning_rate': 1.8331123095707975e-05, 'epoch': 0.21} + + 21%|██ | 1557/7378 [5:20:55<20:00:17, 12.37s/it] + 21%|██ | 1558/7378 [5:21:07<19:56:06, 12.33s/it] + +{'loss': 0.5055, 'learning_rate': 1.8328694083014196e-05, 'epoch': 0.21} + + 21%|██ | 1558/7378 [5:21:07<19:56:06, 12.33s/it] + 21%|██ | 1559/7378 [5:21:19<19:55:04, 12.32s/it] + +{'loss': 0.4685, 'learning_rate': 1.832626346509516e-05, 'epoch': 0.21} + + 21%|██ | 1559/7378 [5:21:19<19:55:04, 12.32s/it] + 21%|██ | 1560/7378 [5:21:32<20:03:32, 12.41s/it] + +{'loss': 0.4882, 'learning_rate': 1.8323831242419322e-05, 'epoch': 0.21} + + 21%|██ | 1560/7378 [5:21:32<20:03:32, 12.41s/it] + 21%|██ | 1561/7378 [5:21:44<20:05:01, 12.43s/it] + +{'loss': 0.5383, 'learning_rate': 1.8321397415455467e-05, 'epoch': 0.21} + + 21%|██ | 1561/7378 [5:21:44<20:05:01, 12.43s/it] + 21%|██ | 1562/7378 [5:21:57<20:04:15, 12.42s/it] + +{'loss': 0.5152, 'learning_rate': 1.8318961984672666e-05, 'epoch': 0.21} + + 21%|██ | 1562/7378 [5:21:57<20:04:15, 12.42s/it] + 21%|██ | 1563/7378 [5:22:09<20:01:26, 12.40s/it] + +{'loss': 0.5397, 'learning_rate': 1.8316524950540318e-05, 'epoch': 0.21} + + 21%|██ | 1563/7378 [5:22:09<20:01:26, 12.40s/it] + 21%|██ | 1564/7378 [5:22:21<19:59:35, 12.38s/it] + +{'loss': 0.4634, 'learning_rate': 1.8314086313528117e-05, 'epoch': 0.21} + + 21%|██ | 1564/7378 [5:22:21<19:59:35, 12.38s/it] + 21%|██ | 1565/7378 [5:22:34<19:57:41, 12.36s/it] + +{'loss': 0.4897, 'learning_rate': 1.8311646074106074e-05, 'epoch': 0.21} + + 21%|██ | 1565/7378 [5:22:34<19:57:41, 12.36s/it] + 21%|██ | 1566/7378 [5:22:46<19:57:09, 12.36s/it] + +{'loss': 0.4668, 'learning_rate': 1.830920423274451e-05, 'epoch': 0.21} + + 21%|██ | 1566/7378 [5:22:46<19:57:09, 12.36s/it] + 21%|██ | 1567/7378 [5:22:59<20:03:31, 12.43s/it] + +{'loss': 0.5476, 'learning_rate': 1.8306760789914052e-05, 'epoch': 0.21} + + 21%|██ | 1567/7378 [5:22:59<20:03:31, 12.43s/it] + 21%|██▏ | 1568/7378 [5:23:11<20:07:59, 12.48s/it] + +{'loss': 0.4941, 'learning_rate': 1.830431574608563e-05, 'epoch': 0.21} + + 21%|██▏ | 1568/7378 [5:23:11<20:07:59, 12.48s/it] + 21%|██▏ | 1569/7378 [5:23:23<19:54:44, 12.34s/it] + +{'loss': 0.4324, 'learning_rate': 1.830186910173049e-05, 'epoch': 0.21} + + 21%|██▏ | 1569/7378 [5:23:23<19:54:44, 12.34s/it] + 21%|██▏ | 1570/7378 [5:23:35<19:54:55, 12.34s/it] + +{'loss': 0.5032, 'learning_rate': 1.8299420857320184e-05, 'epoch': 0.21} + + 21%|██▏ | 1570/7378 [5:23:36<19:54:55, 12.34s/it] + 21%|██▏ | 1571/7378 [5:23:47<19:44:44, 12.24s/it] + +{'loss': 0.4429, 'learning_rate': 1.8296971013326578e-05, 'epoch': 0.21} + + 21%|██▏ | 1571/7378 [5:23:48<19:44:44, 12.24s/it] + 21%|██▏ | 1572/7378 [5:24:00<19:45:07, 12.25s/it] + +{'loss': 0.4977, 'learning_rate': 1.8294519570221832e-05, 'epoch': 0.21} + + 21%|██▏ | 1572/7378 [5:24:00<19:45:07, 12.25s/it] + 21%|██▏ | 1573/7378 [5:24:12<19:40:48, 12.20s/it] + +{'loss': 0.4287, 'learning_rate': 1.8292066528478432e-05, 'epoch': 0.21} + + 21%|██▏ | 1573/7378 [5:24:12<19:40:48, 12.20s/it] + 21%|██▏ | 1574/7378 [5:24:24<19:48:43, 12.29s/it] + +{'loss': 0.5576, 'learning_rate': 1.8289611888569158e-05, 'epoch': 0.21} + + 21%|██▏ | 1574/7378 [5:24:24<19:48:43, 12.29s/it] + 21%|██▏ | 1575/7378 [5:24:36<19:37:02, 12.17s/it] + +{'loss': 0.471, 'learning_rate': 1.8287155650967104e-05, 'epoch': 0.21} + + 21%|██▏ | 1575/7378 [5:24:36<19:37:02, 12.17s/it] + 21%|██▏ | 1576/7378 [5:24:49<19:44:32, 12.25s/it] + +{'loss': 0.4929, 'learning_rate': 1.828469781614567e-05, 'epoch': 0.21} + + 21%|██▏ | 1576/7378 [5:24:49<19:44:32, 12.25s/it] + 21%|██▏ | 1577/7378 [5:25:01<19:45:18, 12.26s/it] + +{'loss': 0.5493, 'learning_rate': 1.828223838457857e-05, 'epoch': 0.21} + + 21%|██▏ | 1577/7378 [5:25:01<19:45:18, 12.26s/it] + 21%|██▏ | 1578/7378 [5:25:13<19:44:34, 12.25s/it] + +{'loss': 0.4431, 'learning_rate': 1.827977735673982e-05, 'epoch': 0.21} + + 21%|██▏ | 1578/7378 [5:25:13<19:44:34, 12.25s/it] + 21%|██▏ | 1579/7378 [5:25:26<19:51:51, 12.33s/it] + +{'loss': 0.4793, 'learning_rate': 1.827731473310374e-05, 'epoch': 0.21} + + 21%|██▏ | 1579/7378 [5:25:26<19:51:51, 12.33s/it] + 21%|██▏ | 1580/7378 [5:25:38<19:52:17, 12.34s/it] + +{'loss': 0.5053, 'learning_rate': 1.8274850514144967e-05, 'epoch': 0.21} + + 21%|██▏ | 1580/7378 [5:25:38<19:52:17, 12.34s/it] + 21%|██▏ | 1581/7378 [5:25:50<19:52:05, 12.34s/it] + +{'loss': 0.4744, 'learning_rate': 1.8272384700338436e-05, 'epoch': 0.21} + + 21%|██▏ | 1581/7378 [5:25:50<19:52:05, 12.34s/it] + 21%|██▏ | 1582/7378 [5:26:03<19:49:42, 12.32s/it] + +{'loss': 0.4308, 'learning_rate': 1.8269917292159393e-05, 'epoch': 0.21} + + 21%|██▏ | 1582/7378 [5:26:03<19:49:42, 12.32s/it] + 21%|██▏ | 1583/7378 [5:26:14<19:33:32, 12.15s/it] + +{'loss': 0.481, 'learning_rate': 1.82674482900834e-05, 'epoch': 0.21} + + 21%|██▏ | 1583/7378 [5:26:14<19:33:32, 12.15s/it] + 21%|██▏ | 1584/7378 [5:26:26<19:27:44, 12.09s/it] + +{'loss': 0.4713, 'learning_rate': 1.8264977694586315e-05, 'epoch': 0.21} + + 21%|██▏ | 1584/7378 [5:26:26<19:27:44, 12.09s/it] + 21%|██▏ | 1585/7378 [5:26:39<19:42:01, 12.24s/it] + +{'loss': 0.5005, 'learning_rate': 1.8262505506144304e-05, 'epoch': 0.21} + + 21%|██▏ | 1585/7378 [5:26:39<19:42:01, 12.24s/it] + 21%|██▏ | 1586/7378 [5:26:51<19:41:34, 12.24s/it] + +{'loss': 0.451, 'learning_rate': 1.826003172523384e-05, 'epoch': 0.21} + + 21%|██▏ | 1586/7378 [5:26:51<19:41:34, 12.24s/it] + 22%|██▏ | 1587/7378 [5:27:04<19:50:26, 12.33s/it] + +{'loss': 0.4735, 'learning_rate': 1.8257556352331715e-05, 'epoch': 0.22} + + 22%|██▏ | 1587/7378 [5:27:04<19:50:26, 12.33s/it] + 22%|██▏ | 1588/7378 [5:27:16<19:50:26, 12.34s/it] + +{'loss': 0.482, 'learning_rate': 1.8255079387915015e-05, 'epoch': 0.22} + + 22%|██▏ | 1588/7378 [5:27:16<19:50:26, 12.34s/it] + 22%|██▏ | 1589/7378 [5:27:28<19:47:37, 12.31s/it] + +{'loss': 0.4387, 'learning_rate': 1.825260083246113e-05, 'epoch': 0.22} + + 22%|██▏ | 1589/7378 [5:27:28<19:47:37, 12.31s/it] + 22%|██▏ | 1590/7378 [5:27:41<19:49:08, 12.33s/it] + +{'loss': 0.5192, 'learning_rate': 1.8250120686447767e-05, 'epoch': 0.22} + + 22%|██▏ | 1590/7378 [5:27:41<19:49:08, 12.33s/it] + 22%|██▏ | 1591/7378 [5:27:53<19:33:35, 12.17s/it] + +{'loss': 0.4922, 'learning_rate': 1.8247638950352934e-05, 'epoch': 0.22} + + 22%|██▏ | 1591/7378 [5:27:53<19:33:35, 12.17s/it] + 22%|██▏ | 1592/7378 [5:28:05<19:41:12, 12.25s/it] + +{'loss': 0.4995, 'learning_rate': 1.824515562465495e-05, 'epoch': 0.22} + + 22%|██▏ | 1592/7378 [5:28:05<19:41:12, 12.25s/it] + 22%|██▏ | 1593/7378 [5:28:17<19:43:22, 12.27s/it] + +{'loss': 0.542, 'learning_rate': 1.8242670709832436e-05, 'epoch': 0.22} + + 22%|██▏ | 1593/7378 [5:28:17<19:43:22, 12.27s/it] + 22%|██▏ | 1594/7378 [5:28:30<19:50:57, 12.35s/it] + +{'loss': 0.4393, 'learning_rate': 1.824018420636432e-05, 'epoch': 0.22} + + 22%|██▏ | 1594/7378 [5:28:30<19:50:57, 12.35s/it] + 22%|██▏ | 1595/7378 [5:28:42<19:33:05, 12.17s/it] + +{'loss': 0.5377, 'learning_rate': 1.823769611472983e-05, 'epoch': 0.22} + + 22%|██▏ | 1595/7378 [5:28:42<19:33:05, 12.17s/it] + 22%|██▏ | 1596/7378 [5:28:54<19:31:58, 12.16s/it] + +{'loss': 0.5115, 'learning_rate': 1.8235206435408516e-05, 'epoch': 0.22} + + 22%|██▏ | 1596/7378 [5:28:54<19:31:58, 12.16s/it] + 22%|██▏ | 1597/7378 [5:29:06<19:33:21, 12.18s/it] + +{'loss': 0.5392, 'learning_rate': 1.8232715168880223e-05, 'epoch': 0.22} + + 22%|██▏ | 1597/7378 [5:29:06<19:33:21, 12.18s/it] + 22%|██▏ | 1598/7378 [5:29:18<19:37:49, 12.23s/it] + +{'loss': 0.5719, 'learning_rate': 1.82302223156251e-05, 'epoch': 0.22} + + 22%|██▏ | 1598/7378 [5:29:18<19:37:49, 12.23s/it] + 22%|██▏ | 1599/7378 [5:29:30<19:35:30, 12.20s/it] + +{'loss': 0.4858, 'learning_rate': 1.8227727876123605e-05, 'epoch': 0.22} + + 22%|██▏ | 1599/7378 [5:29:30<19:35:30, 12.20s/it] + 22%|██▏ | 1600/7378 [5:29:43<19:31:32, 12.17s/it] + +{'loss': 0.5114, 'learning_rate': 1.822523185085651e-05, 'epoch': 0.22} + + 22%|██▏ | 1600/7378 [5:29:43<19:31:32, 12.17s/it] + 22%|██▏ | 1601/7378 [5:29:54<19:25:13, 12.10s/it] + +{'loss': 0.4922, 'learning_rate': 1.8222734240304874e-05, 'epoch': 0.22} + + 22%|██▏ | 1601/7378 [5:29:54<19:25:13, 12.10s/it] + 22%|██▏ | 1602/7378 [5:30:07<19:34:55, 12.20s/it] + +{'loss': 0.508, 'learning_rate': 1.8220235044950078e-05, 'epoch': 0.22} + + 22%|██▏ | 1602/7378 [5:30:07<19:34:55, 12.20s/it] + 22%|██▏ | 1603/7378 [5:30:19<19:31:38, 12.17s/it] + +{'loss': 0.503, 'learning_rate': 1.8217734265273802e-05, 'epoch': 0.22} + + 22%|██▏ | 1603/7378 [5:30:19<19:31:38, 12.17s/it] + 22%|██▏ | 1604/7378 [5:30:31<19:29:25, 12.15s/it] + +{'loss': 0.4453, 'learning_rate': 1.8215231901758034e-05, 'epoch': 0.22} + + 22%|██▏ | 1604/7378 [5:30:31<19:29:25, 12.15s/it] + 22%|██▏ | 1605/7378 [5:30:43<19:35:28, 12.22s/it] + +{'loss': 0.471, 'learning_rate': 1.8212727954885063e-05, 'epoch': 0.22} + + 22%|██▏ | 1605/7378 [5:30:43<19:35:28, 12.22s/it] + 22%|██▏ | 1606/7378 [5:30:56<19:40:48, 12.27s/it] + +{'loss': 0.4387, 'learning_rate': 1.8210222425137485e-05, 'epoch': 0.22} + + 22%|██▏ | 1606/7378 [5:30:56<19:40:48, 12.27s/it] + 22%|██▏ | 1607/7378 [5:31:08<19:48:15, 12.35s/it] + +{'loss': 0.4888, 'learning_rate': 1.8207715312998203e-05, 'epoch': 0.22} + + 22%|██▏ | 1607/7378 [5:31:09<19:48:15, 12.35s/it] + 22%|██▏ | 1608/7378 [5:31:21<20:04:22, 12.52s/it] + +{'loss': 0.439, 'learning_rate': 1.8205206618950427e-05, 'epoch': 0.22} + + 22%|██▏ | 1608/7378 [5:31:21<20:04:22, 12.52s/it] + 22%|██▏ | 1609/7378 [5:31:34<20:14:07, 12.63s/it] + +{'loss': 0.5417, 'learning_rate': 1.820269634347766e-05, 'epoch': 0.22} + + 22%|██▏ | 1609/7378 [5:31:34<20:14:07, 12.63s/it] + 22%|██▏ | 1610/7378 [5:31:46<19:59:27, 12.48s/it] + +{'loss': 0.4176, 'learning_rate': 1.8200184487063727e-05, 'epoch': 0.22} + + 22%|██▏ | 1610/7378 [5:31:46<19:59:27, 12.48s/it] + 22%|██▏ | 1611/7378 [5:31:59<19:53:40, 12.42s/it] + +{'loss': 0.5095, 'learning_rate': 1.819767105019274e-05, 'epoch': 0.22} + + 22%|██▏ | 1611/7378 [5:31:59<19:53:40, 12.42s/it] + 22%|██▏ | 1612/7378 [5:32:11<19:49:39, 12.38s/it] + +{'loss': 0.5167, 'learning_rate': 1.819515603334913e-05, 'epoch': 0.22} + + 22%|██▏ | 1612/7378 [5:32:11<19:49:39, 12.38s/it] + 22%|██▏ | 1613/7378 [5:32:23<19:43:10, 12.31s/it] + +{'loss': 0.4583, 'learning_rate': 1.819263943701763e-05, 'epoch': 0.22} + + 22%|██▏ | 1613/7378 [5:32:23<19:43:10, 12.31s/it] + 22%|██▏ | 1614/7378 [5:32:36<19:54:33, 12.43s/it] + +{'loss': 0.5059, 'learning_rate': 1.8190121261683268e-05, 'epoch': 0.22} + + 22%|██▏ | 1614/7378 [5:32:36<19:54:33, 12.43s/it] + 22%|██▏ | 1615/7378 [5:32:48<19:44:36, 12.33s/it] + +{'loss': 0.4678, 'learning_rate': 1.8187601507831388e-05, 'epoch': 0.22} + + 22%|██▏ | 1615/7378 [5:32:48<19:44:36, 12.33s/it] + 22%|██▏ | 1616/7378 [5:33:01<19:56:26, 12.46s/it] + +{'loss': 0.5469, 'learning_rate': 1.818508017594763e-05, 'epoch': 0.22} + + 22%|██▏ | 1616/7378 [5:33:01<19:56:26, 12.46s/it] + 22%|██▏ | 1617/7378 [5:33:13<19:59:03, 12.49s/it] + +{'loss': 0.5503, 'learning_rate': 1.8182557266517945e-05, 'epoch': 0.22} + + 22%|██▏ | 1617/7378 [5:33:13<19:59:03, 12.49s/it] + 22%|██▏ | 1618/7378 [5:33:26<19:54:41, 12.44s/it] + +{'loss': 0.4708, 'learning_rate': 1.818003278002858e-05, 'epoch': 0.22} + + 22%|██▏ | 1618/7378 [5:33:26<19:54:41, 12.44s/it] + 22%|██▏ | 1619/7378 [5:33:38<19:48:02, 12.38s/it] + +{'loss': 0.5562, 'learning_rate': 1.8177506716966088e-05, 'epoch': 0.22} + + 22%|██▏ | 1619/7378 [5:33:38<19:48:02, 12.38s/it] + 22%|██▏ | 1620/7378 [5:33:50<19:51:02, 12.41s/it] + +{'loss': 0.4314, 'learning_rate': 1.8174979077817338e-05, 'epoch': 0.22} + + 22%|██▏ | 1620/7378 [5:33:50<19:51:02, 12.41s/it] + 22%|██▏ | 1621/7378 [5:34:03<19:59:30, 12.50s/it] + +{'loss': 0.4839, 'learning_rate': 1.817244986306948e-05, 'epoch': 0.22} + + 22%|██▏ | 1621/7378 [5:34:03<19:59:30, 12.50s/it] + 22%|██▏ | 1622/7378 [5:34:15<19:56:07, 12.47s/it] + +{'loss': 0.4553, 'learning_rate': 1.816991907320999e-05, 'epoch': 0.22} + + 22%|██▏ | 1622/7378 [5:34:15<19:56:07, 12.47s/it] + 22%|██▏ | 1623/7378 [5:34:28<20:01:45, 12.53s/it] + +{'loss': 0.4443, 'learning_rate': 1.8167386708726636e-05, 'epoch': 0.22} + + 22%|██▏ | 1623/7378 [5:34:28<20:01:45, 12.53s/it] + 22%|██▏ | 1624/7378 [5:34:40<19:44:04, 12.35s/it] + +{'loss': 0.4965, 'learning_rate': 1.8164852770107487e-05, 'epoch': 0.22} + + 22%|██▏ | 1624/7378 [5:34:40<19:44:04, 12.35s/it] + 22%|██▏ | 1625/7378 [5:34:52<19:38:00, 12.29s/it] + +{'loss': 0.5405, 'learning_rate': 1.8162317257840926e-05, 'epoch': 0.22} + + 22%|██▏ | 1625/7378 [5:34:52<19:38:00, 12.29s/it] + 22%|██▏ | 1626/7378 [5:35:05<19:43:30, 12.35s/it] + +{'loss': 0.4577, 'learning_rate': 1.8159780172415634e-05, 'epoch': 0.22} + + 22%|██▏ | 1626/7378 [5:35:05<19:43:30, 12.35s/it] + 22%|██▏ | 1627/7378 [5:35:17<19:41:15, 12.32s/it] + +{'loss': 0.4601, 'learning_rate': 1.815724151432059e-05, 'epoch': 0.22} + + 22%|██▏ | 1627/7378 [5:35:17<19:41:15, 12.32s/it] + 22%|██▏ | 1628/7378 [5:35:29<19:37:25, 12.29s/it] + +{'loss': 0.5734, 'learning_rate': 1.815470128404508e-05, 'epoch': 0.22} + + 22%|██▏ | 1628/7378 [5:35:29<19:37:25, 12.29s/it] + 22%|██▏ | 1629/7378 [5:35:41<19:34:32, 12.26s/it] + +{'loss': 0.4611, 'learning_rate': 1.8152159482078695e-05, 'epoch': 0.22} + + 22%|██▏ | 1629/7378 [5:35:41<19:34:32, 12.26s/it] + 22%|██▏ | 1630/7378 [5:35:53<19:34:08, 12.26s/it] + +{'loss': 0.4678, 'learning_rate': 1.8149616108911327e-05, 'epoch': 0.22} + + 22%|██▏ | 1630/7378 [5:35:53<19:34:08, 12.26s/it] + 22%|██▏ | 1631/7378 [5:36:06<19:36:52, 12.29s/it] + +{'loss': 0.4751, 'learning_rate': 1.8147071165033177e-05, 'epoch': 0.22} + + 22%|██▏ | 1631/7378 [5:36:06<19:36:52, 12.29s/it] + 22%|██▏ | 1632/7378 [5:36:18<19:42:35, 12.35s/it] + +{'loss': 0.4088, 'learning_rate': 1.8144524650934735e-05, 'epoch': 0.22} + + 22%|██▏ | 1632/7378 [5:36:18<19:42:35, 12.35s/it] + 22%|██▏ | 1633/7378 [5:36:31<19:44:18, 12.37s/it] + +{'loss': 0.4684, 'learning_rate': 1.8141976567106804e-05, 'epoch': 0.22} + + 22%|██▏ | 1633/7378 [5:36:31<19:44:18, 12.37s/it] + 22%|██▏ | 1634/7378 [5:36:43<19:34:38, 12.27s/it] + +{'loss': 0.5067, 'learning_rate': 1.813942691404049e-05, 'epoch': 0.22} + + 22%|██▏ | 1634/7378 [5:36:43<19:34:38, 12.27s/it] + 22%|██▏ | 1635/7378 [5:36:55<19:35:31, 12.28s/it] + +{'loss': 0.5499, 'learning_rate': 1.8136875692227197e-05, 'epoch': 0.22} + + 22%|██▏ | 1635/7378 [5:36:55<19:35:31, 12.28s/it] + 22%|██▏ | 1636/7378 [5:37:08<19:46:04, 12.39s/it] + +{'loss': 0.5268, 'learning_rate': 1.813432290215863e-05, 'epoch': 0.22} + + 22%|██▏ | 1636/7378 [5:37:08<19:46:04, 12.39s/it] + 22%|██▏ | 1637/7378 [5:37:20<19:47:55, 12.42s/it] + +{'loss': 0.4847, 'learning_rate': 1.8131768544326808e-05, 'epoch': 0.22} + + 22%|██▏ | 1637/7378 [5:37:20<19:47:55, 12.42s/it] + 22%|██▏ | 1638/7378 [5:37:33<19:55:51, 12.50s/it] + +{'loss': 0.4254, 'learning_rate': 1.8129212619224034e-05, 'epoch': 0.22} + + 22%|██▏ | 1638/7378 [5:37:33<19:55:51, 12.50s/it] + 22%|██▏ | 1639/7378 [5:37:45<19:47:03, 12.41s/it] + +{'loss': 0.4886, 'learning_rate': 1.8126655127342927e-05, 'epoch': 0.22} + + 22%|██▏ | 1639/7378 [5:37:45<19:47:03, 12.41s/it] + 22%|██▏ | 1640/7378 [5:37:57<19:45:29, 12.40s/it] + +{'loss': 0.498, 'learning_rate': 1.8124096069176405e-05, 'epoch': 0.22} + + 22%|██▏ | 1640/7378 [5:37:57<19:45:29, 12.40s/it] + 22%|██▏ | 1641/7378 [5:38:10<19:37:53, 12.32s/it] + +{'loss': 0.5514, 'learning_rate': 1.812153544521768e-05, 'epoch': 0.22} + + 22%|██▏ | 1641/7378 [5:38:10<19:37:53, 12.32s/it] + 22%|██▏ | 1642/7378 [5:38:22<19:48:49, 12.44s/it] + +{'loss': 0.5976, 'learning_rate': 1.811897325596028e-05, 'epoch': 0.22} + + 22%|██▏ | 1642/7378 [5:38:22<19:48:49, 12.44s/it] + 22%|██▏ | 1643/7378 [5:38:35<19:56:54, 12.52s/it] + +{'loss': 0.4976, 'learning_rate': 1.811640950189802e-05, 'epoch': 0.22} + + 22%|██▏ | 1643/7378 [5:38:35<19:56:54, 12.52s/it] + 22%|██▏ | 1644/7378 [5:38:47<19:53:38, 12.49s/it] + +{'loss': 0.4413, 'learning_rate': 1.8113844183525026e-05, 'epoch': 0.22} + + 22%|██▏ | 1644/7378 [5:38:47<19:53:38, 12.49s/it] + 22%|██▏ | 1645/7378 [5:39:00<19:40:34, 12.36s/it] + +{'loss': 0.4353, 'learning_rate': 1.8111277301335723e-05, 'epoch': 0.22} + + 22%|██▏ | 1645/7378 [5:39:00<19:40:34, 12.36s/it] + 22%|██▏ | 1646/7378 [5:39:12<19:38:03, 12.33s/it] + +{'loss': 0.527, 'learning_rate': 1.8108708855824838e-05, 'epoch': 0.22} + + 22%|██▏ | 1646/7378 [5:39:12<19:38:03, 12.33s/it] + 22%|██▏ | 1647/7378 [5:39:24<19:36:34, 12.32s/it] + +{'loss': 0.4536, 'learning_rate': 1.81061388474874e-05, 'epoch': 0.22} + + 22%|██▏ | 1647/7378 [5:39:24<19:36:34, 12.32s/it] + 22%|██▏ | 1648/7378 [5:39:36<19:31:16, 12.26s/it] + +{'loss': 0.5001, 'learning_rate': 1.8103567276818736e-05, 'epoch': 0.22} + + 22%|██▏ | 1648/7378 [5:39:36<19:31:16, 12.26s/it] + 22%|██▏ | 1649/7378 [5:39:49<19:33:24, 12.29s/it] + +{'loss': 0.4766, 'learning_rate': 1.8100994144314477e-05, 'epoch': 0.22} + + 22%|██▏ | 1649/7378 [5:39:49<19:33:24, 12.29s/it] + 22%|██▏ | 1650/7378 [5:40:01<19:39:40, 12.36s/it] + +{'loss': 0.4599, 'learning_rate': 1.809841945047055e-05, 'epoch': 0.22} + + 22%|██▏ | 1650/7378 [5:40:01<19:39:40, 12.36s/it] + 22%|██▏ | 1651/7378 [5:40:14<19:47:04, 12.44s/it] + +{'loss': 0.4909, 'learning_rate': 1.809584319578319e-05, 'epoch': 0.22} + + 22%|██▏ | 1651/7378 [5:40:14<19:47:04, 12.44s/it] + 22%|██▏ | 1652/7378 [5:40:26<19:42:30, 12.39s/it] + +{'loss': 0.4815, 'learning_rate': 1.8093265380748932e-05, 'epoch': 0.22} + + 22%|██▏ | 1652/7378 [5:40:26<19:42:30, 12.39s/it] + 22%|██▏ | 1653/7378 [5:40:38<19:37:03, 12.34s/it] + +{'loss': 0.5079, 'learning_rate': 1.8090686005864607e-05, 'epoch': 0.22} + + 22%|██▏ | 1653/7378 [5:40:38<19:37:03, 12.34s/it] + 22%|██▏ | 1654/7378 [5:40:50<19:25:56, 12.22s/it] + +{'loss': 0.4269, 'learning_rate': 1.808810507162735e-05, 'epoch': 0.22} + + 22%|██▏ | 1654/7378 [5:40:50<19:25:56, 12.22s/it] + 22%|██▏ | 1655/7378 [5:41:03<19:30:04, 12.27s/it] + +{'loss': 0.4983, 'learning_rate': 1.8085522578534587e-05, 'epoch': 0.22} + + 22%|██▏ | 1655/7378 [5:41:03<19:30:04, 12.27s/it] + 22%|██▏ | 1656/7378 [5:41:15<19:30:59, 12.28s/it] + +{'loss': 0.4566, 'learning_rate': 1.808293852708407e-05, 'epoch': 0.22} + + 22%|██▏ | 1656/7378 [5:41:15<19:30:59, 12.28s/it] + 22%|██▏ | 1657/7378 [5:41:27<19:34:18, 12.32s/it] + +{'loss': 0.5015, 'learning_rate': 1.808035291777382e-05, 'epoch': 0.22} + + 22%|██▏ | 1657/7378 [5:41:27<19:34:18, 12.32s/it] + 22%|██▏ | 1658/7378 [5:41:40<19:37:03, 12.35s/it] + +{'loss': 0.4711, 'learning_rate': 1.8077765751102183e-05, 'epoch': 0.22} + + 22%|██▏ | 1658/7378 [5:41:40<19:37:03, 12.35s/it] + 22%|██▏ | 1659/7378 [5:41:52<19:33:08, 12.31s/it] + +{'loss': 0.5188, 'learning_rate': 1.8075177027567785e-05, 'epoch': 0.22} + + 22%|██▏ | 1659/7378 [5:41:52<19:33:08, 12.31s/it] + 22%|██▏ | 1660/7378 [5:42:04<19:37:47, 12.36s/it] + +{'loss': 0.4892, 'learning_rate': 1.8072586747669568e-05, 'epoch': 0.22} + + 22%|██▏ | 1660/7378 [5:42:04<19:37:47, 12.36s/it] + 23%|██▎ | 1661/7378 [5:42:16<19:28:13, 12.26s/it] + +{'loss': 0.5107, 'learning_rate': 1.806999491190677e-05, 'epoch': 0.23} + + 23%|██▎ | 1661/7378 [5:42:16<19:28:13, 12.26s/it] + 23%|██▎ | 1662/7378 [5:42:29<19:41:18, 12.40s/it] + +{'loss': 0.5106, 'learning_rate': 1.8067401520778918e-05, 'epoch': 0.23} + + 23%|██▎ | 1662/7378 [5:42:29<19:41:18, 12.40s/it] + 23%|██▎ | 1663/7378 [5:42:41<19:34:36, 12.33s/it] + +{'loss': 0.4782, 'learning_rate': 1.8064806574785855e-05, 'epoch': 0.23} + + 23%|██▎ | 1663/7378 [5:42:41<19:34:36, 12.33s/it] + 23%|██▎ | 1664/7378 [5:42:54<19:31:45, 12.30s/it] + +{'loss': 0.5052, 'learning_rate': 1.8062210074427713e-05, 'epoch': 0.23} + + 23%|██▎ | 1664/7378 [5:42:54<19:31:45, 12.30s/it] + 23%|██▎ | 1665/7378 [5:43:06<19:35:49, 12.35s/it] + +{'loss': 0.4928, 'learning_rate': 1.805961202020493e-05, 'epoch': 0.23} + + 23%|██▎ | 1665/7378 [5:43:06<19:35:49, 12.35s/it] + 23%|██▎ | 1666/7378 [5:43:19<19:42:03, 12.42s/it] + +{'loss': 0.4932, 'learning_rate': 1.8057012412618236e-05, 'epoch': 0.23} + + 23%|██▎ | 1666/7378 [5:43:19<19:42:03, 12.42s/it] + 23%|██▎ | 1667/7378 [5:43:30<19:27:12, 12.26s/it] + +{'loss': 0.5232, 'learning_rate': 1.8054411252168665e-05, 'epoch': 0.23} + + 23%|██▎ | 1667/7378 [5:43:30<19:27:12, 12.26s/it] + 23%|██▎ | 1668/7378 [5:43:43<19:29:01, 12.28s/it] + +{'loss': 0.5044, 'learning_rate': 1.805180853935755e-05, 'epoch': 0.23} + + 23%|██▎ | 1668/7378 [5:43:43<19:29:01, 12.28s/it] + 23%|██▎ | 1669/7378 [5:43:55<19:20:43, 12.20s/it] + +{'loss': 0.4923, 'learning_rate': 1.804920427468653e-05, 'epoch': 0.23} + + 23%|██▎ | 1669/7378 [5:43:55<19:20:43, 12.20s/it] + 23%|██▎ | 1670/7378 [5:44:07<19:30:20, 12.30s/it] + +{'loss': 0.4982, 'learning_rate': 1.8046598458657528e-05, 'epoch': 0.23} + + 23%|██▎ | 1670/7378 [5:44:07<19:30:20, 12.30s/it] + 23%|██▎ | 1671/7378 [5:44:20<19:36:05, 12.36s/it] + +{'loss': 0.5137, 'learning_rate': 1.8043991091772778e-05, 'epoch': 0.23} + + 23%|██▎ | 1671/7378 [5:44:20<19:36:05, 12.36s/it] + 23%|██▎ | 1672/7378 [5:44:32<19:27:57, 12.28s/it] + +{'loss': 0.4957, 'learning_rate': 1.804138217453481e-05, 'epoch': 0.23} + + 23%|██▎ | 1672/7378 [5:44:32<19:27:57, 12.28s/it] + 23%|██▎ | 1673/7378 [5:44:44<19:21:05, 12.21s/it] + +{'loss': 0.4747, 'learning_rate': 1.8038771707446446e-05, 'epoch': 0.23} + + 23%|██▎ | 1673/7378 [5:44:44<19:21:05, 12.21s/it] + 23%|██▎ | 1674/7378 [5:44:56<19:24:02, 12.24s/it] + +{'loss': 0.5133, 'learning_rate': 1.8036159691010816e-05, 'epoch': 0.23} + + 23%|██▎ | 1674/7378 [5:44:56<19:24:02, 12.24s/it] + 23%|██▎ | 1675/7378 [5:45:08<19:20:16, 12.21s/it] + +{'loss': 0.5122, 'learning_rate': 1.8033546125731347e-05, 'epoch': 0.23} + + 23%|██▎ | 1675/7378 [5:45:08<19:20:16, 12.21s/it] + 23%|██▎ | 1676/7378 [5:45:21<19:19:31, 12.20s/it] + +{'loss': 0.4671, 'learning_rate': 1.8030931012111767e-05, 'epoch': 0.23} + + 23%|██▎ | 1676/7378 [5:45:21<19:19:31, 12.20s/it] + 23%|██▎ | 1677/7378 [5:45:33<19:24:05, 12.25s/it] + +{'loss': 0.4475, 'learning_rate': 1.8028314350656085e-05, 'epoch': 0.23} + + 23%|██▎ | 1677/7378 [5:45:33<19:24:05, 12.25s/it] + 23%|██▎ | 1678/7378 [5:45:46<19:52:51, 12.56s/it] + +{'loss': 0.4905, 'learning_rate': 1.8025696141868635e-05, 'epoch': 0.23} + + 23%|██▎ | 1678/7378 [5:45:46<19:52:51, 12.56s/it] + 23%|██▎ | 1679/7378 [5:45:59<19:56:23, 12.60s/it] + +{'loss': 0.4825, 'learning_rate': 1.8023076386254025e-05, 'epoch': 0.23} + + 23%|██▎ | 1679/7378 [5:45:59<19:56:23, 12.60s/it] + 23%|██▎ | 1680/7378 [5:46:12<19:56:34, 12.60s/it] + +{'loss': 0.4264, 'learning_rate': 1.8020455084317178e-05, 'epoch': 0.23} + + 23%|██▎ | 1680/7378 [5:46:12<19:56:34, 12.60s/it] + 23%|██▎ | 1681/7378 [5:46:24<19:42:27, 12.45s/it] + +{'loss': 0.4336, 'learning_rate': 1.801783223656331e-05, 'epoch': 0.23} + + 23%|██▎ | 1681/7378 [5:46:24<19:42:27, 12.45s/it] + 23%|██▎ | 1682/7378 [5:46:36<19:49:45, 12.53s/it] + +{'loss': 0.4667, 'learning_rate': 1.801520784349793e-05, 'epoch': 0.23} + + 23%|██▎ | 1682/7378 [5:46:36<19:49:45, 12.53s/it] + 23%|██▎ | 1683/7378 [5:46:49<19:55:51, 12.60s/it] + +{'loss': 0.5538, 'learning_rate': 1.8012581905626847e-05, 'epoch': 0.23} + + 23%|██▎ | 1683/7378 [5:46:49<19:55:51, 12.60s/it] + 23%|██▎ | 1684/7378 [5:47:02<19:56:39, 12.61s/it] + +{'loss': 0.4703, 'learning_rate': 1.8009954423456175e-05, 'epoch': 0.23} + + 23%|██▎ | 1684/7378 [5:47:02<19:56:39, 12.61s/it] + 23%|██▎ | 1685/7378 [5:47:14<19:59:13, 12.64s/it] + +{'loss': 0.447, 'learning_rate': 1.800732539749232e-05, 'epoch': 0.23} + + 23%|██▎ | 1685/7378 [5:47:14<19:59:13, 12.64s/it] + 23%|██▎ | 1686/7378 [5:47:27<19:53:25, 12.58s/it] + +{'loss': 0.4295, 'learning_rate': 1.800469482824198e-05, 'epoch': 0.23} + + 23%|██▎ | 1686/7378 [5:47:27<19:53:25, 12.58s/it] + 23%|██▎ | 1687/7378 [5:47:39<19:50:53, 12.56s/it] + +{'loss': 0.4875, 'learning_rate': 1.8002062716212162e-05, 'epoch': 0.23} + + 23%|██▎ | 1687/7378 [5:47:39<19:50:53, 12.56s/it] + 23%|██▎ | 1688/7378 [5:47:52<19:41:26, 12.46s/it] + +{'loss': 0.5513, 'learning_rate': 1.799942906191016e-05, 'epoch': 0.23} + + 23%|██▎ | 1688/7378 [5:47:52<19:41:26, 12.46s/it] + 23%|██▎ | 1689/7378 [5:48:04<19:45:44, 12.51s/it] + +{'loss': 0.4797, 'learning_rate': 1.7996793865843568e-05, 'epoch': 0.23} + + 23%|██▎ | 1689/7378 [5:48:04<19:45:44, 12.51s/it] + 23%|██▎ | 1690/7378 [5:48:17<19:43:28, 12.48s/it] + +{'loss': 0.5349, 'learning_rate': 1.7994157128520282e-05, 'epoch': 0.23} + + 23%|██▎ | 1690/7378 [5:48:17<19:43:28, 12.48s/it] + 23%|██▎ | 1691/7378 [5:48:29<19:28:28, 12.33s/it] + +{'loss': 0.516, 'learning_rate': 1.7991518850448494e-05, 'epoch': 0.23} + + 23%|██▎ | 1691/7378 [5:48:29<19:28:28, 12.33s/it] + 23%|██▎ | 1692/7378 [5:48:41<19:28:08, 12.33s/it] + +{'loss': 0.4832, 'learning_rate': 1.7988879032136687e-05, 'epoch': 0.23} + + 23%|██▎ | 1692/7378 [5:48:41<19:28:08, 12.33s/it] + 23%|██▎ | 1693/7378 [5:48:53<19:20:49, 12.25s/it] + +{'loss': 0.4676, 'learning_rate': 1.7986237674093646e-05, 'epoch': 0.23} + + 23%|██▎ | 1693/7378 [5:48:53<19:20:49, 12.25s/it] + 23%|██▎ | 1694/7378 [5:49:06<19:28:21, 12.33s/it] + +{'loss': 0.4802, 'learning_rate': 1.798359477682845e-05, 'epoch': 0.23} + + 23%|██▎ | 1694/7378 [5:49:06<19:28:21, 12.33s/it] + 23%|██▎ | 1695/7378 [5:49:18<19:19:03, 12.24s/it] + +{'loss': 0.4324, 'learning_rate': 1.798095034085048e-05, 'epoch': 0.23} + + 23%|██▎ | 1695/7378 [5:49:18<19:19:03, 12.24s/it] + 23%|██▎ | 1696/7378 [5:49:30<19:21:50, 12.27s/it] + +{'loss': 0.4127, 'learning_rate': 1.7978304366669407e-05, 'epoch': 0.23} + + 23%|██▎ | 1696/7378 [5:49:30<19:21:50, 12.27s/it] + 23%|██▎ | 1697/7378 [5:49:42<19:25:48, 12.31s/it] + +{'loss': 0.4972, 'learning_rate': 1.79756568547952e-05, 'epoch': 0.23} + + 23%|██▎ | 1697/7378 [5:49:42<19:25:48, 12.31s/it] + 23%|██▎ | 1698/7378 [5:49:55<19:27:34, 12.33s/it] + +{'loss': 0.4917, 'learning_rate': 1.7973007805738124e-05, 'epoch': 0.23} + + 23%|██▎ | 1698/7378 [5:49:55<19:27:34, 12.33s/it] + 23%|██▎ | 1699/7378 [5:50:07<19:32:28, 12.39s/it] + +{'loss': 0.4199, 'learning_rate': 1.7970357220008747e-05, 'epoch': 0.23} + + 23%|██▎ | 1699/7378 [5:50:07<19:32:28, 12.39s/it] + 23%|██▎ | 1700/7378 [5:50:20<19:33:27, 12.40s/it] + +{'loss': 0.433, 'learning_rate': 1.7967705098117923e-05, 'epoch': 0.23} + + 23%|██▎ | 1700/7378 [5:50:20<19:33:27, 12.40s/it] + 23%|██▎ | 1701/7378 [5:50:32<19:32:26, 12.39s/it] + +{'loss': 0.5321, 'learning_rate': 1.796505144057681e-05, 'epoch': 0.23} + + 23%|██▎ | 1701/7378 [5:50:32<19:32:26, 12.39s/it] + 23%|██▎ | 1702/7378 [5:50:44<19:15:49, 12.22s/it] + +{'loss': 0.4921, 'learning_rate': 1.7962396247896855e-05, 'epoch': 0.23} + + 23%|██▎ | 1702/7378 [5:50:44<19:15:49, 12.22s/it] + 23%|██▎ | 1703/7378 [5:50:56<19:15:45, 12.22s/it] + +{'loss': 0.5318, 'learning_rate': 1.795973952058981e-05, 'epoch': 0.23} + + 23%|██▎ | 1703/7378 [5:50:56<19:15:45, 12.22s/it] + 23%|██▎ | 1704/7378 [5:51:08<19:18:20, 12.25s/it] + +{'loss': 0.4398, 'learning_rate': 1.7957081259167714e-05, 'epoch': 0.23} + + 23%|██▎ | 1704/7378 [5:51:08<19:18:20, 12.25s/it] + 23%|██▎ | 1705/7378 [5:51:21<19:14:58, 12.22s/it] + +{'loss': 0.4931, 'learning_rate': 1.7954421464142908e-05, 'epoch': 0.23} + + 23%|██▎ | 1705/7378 [5:51:21<19:14:58, 12.22s/it] + 23%|██▎ | 1706/7378 [5:51:33<19:13:05, 12.20s/it] + +{'loss': 0.4402, 'learning_rate': 1.7951760136028023e-05, 'epoch': 0.23} + + 23%|██▎ | 1706/7378 [5:51:33<19:13:05, 12.20s/it] + 23%|██▎ | 1707/7378 [5:51:45<19:13:50, 12.21s/it] + +{'loss': 0.4575, 'learning_rate': 1.794909727533599e-05, 'epoch': 0.23} + + 23%|██▎ | 1707/7378 [5:51:45<19:13:50, 12.21s/it] + 23%|██▎ | 1708/7378 [5:51:57<19:12:13, 12.19s/it] + +{'loss': 0.4509, 'learning_rate': 1.7946432882580032e-05, 'epoch': 0.23} + + 23%|██▎ | 1708/7378 [5:51:57<19:12:13, 12.19s/it] + 23%|██▎ | 1709/7378 [5:52:09<19:03:26, 12.10s/it] + +{'loss': 0.4389, 'learning_rate': 1.794376695827367e-05, 'epoch': 0.23} + + 23%|██▎ | 1709/7378 [5:52:09<19:03:26, 12.10s/it] + 23%|██▎ | 1710/7378 [5:52:21<19:09:36, 12.17s/it] + +{'loss': 0.4834, 'learning_rate': 1.7941099502930716e-05, 'epoch': 0.23} + + 23%|██▎ | 1710/7378 [5:52:21<19:09:36, 12.17s/it] + 23%|██▎ | 1711/7378 [5:52:34<19:19:55, 12.28s/it] + +{'loss': 0.4853, 'learning_rate': 1.793843051706529e-05, 'epoch': 0.23} + + 23%|██▎ | 1711/7378 [5:52:34<19:19:55, 12.28s/it] + 23%|██▎ | 1712/7378 [5:52:46<19:22:56, 12.31s/it] + +{'loss': 0.4609, 'learning_rate': 1.7935760001191785e-05, 'epoch': 0.23} + + 23%|██▎ | 1712/7378 [5:52:46<19:22:56, 12.31s/it] + 23%|██▎ | 1713/7378 [5:52:58<19:14:19, 12.23s/it] + +{'loss': 0.5197, 'learning_rate': 1.7933087955824908e-05, 'epoch': 0.23} + + 23%|██▎ | 1713/7378 [5:52:58<19:14:19, 12.23s/it] + 23%|██▎ | 1714/7378 [5:53:11<19:26:09, 12.35s/it] + +{'loss': 0.4367, 'learning_rate': 1.7930414381479653e-05, 'epoch': 0.23} + + 23%|██▎ | 1714/7378 [5:53:11<19:26:09, 12.35s/it] + 23%|██▎ | 1715/7378 [5:53:23<19:23:59, 12.33s/it] + +{'loss': 0.4922, 'learning_rate': 1.7927739278671307e-05, 'epoch': 0.23} + + 23%|██▎ | 1715/7378 [5:53:23<19:23:59, 12.33s/it] + 23%|██▎ | 1716/7378 [5:53:36<19:29:59, 12.40s/it] + +{'loss': 0.509, 'learning_rate': 1.7925062647915462e-05, 'epoch': 0.23} + + 23%|██▎ | 1716/7378 [5:53:36<19:29:59, 12.40s/it] + 23%|██▎ | 1717/7378 [5:53:48<19:27:16, 12.37s/it] + +{'loss': 0.4785, 'learning_rate': 1.7922384489727985e-05, 'epoch': 0.23} + + 23%|██▎ | 1717/7378 [5:53:48<19:27:16, 12.37s/it] + 23%|██▎ | 1718/7378 [5:54:00<19:22:59, 12.33s/it] + +{'loss': 0.4831, 'learning_rate': 1.7919704804625055e-05, 'epoch': 0.23} + + 23%|██▎ | 1718/7378 [5:54:00<19:22:59, 12.33s/it] + 23%|██▎ | 1719/7378 [5:54:13<19:26:44, 12.37s/it] + +{'loss': 0.5146, 'learning_rate': 1.7917023593123143e-05, 'epoch': 0.23} + + 23%|██▎ | 1719/7378 [5:54:13<19:26:44, 12.37s/it] + 23%|██▎ | 1720/7378 [5:54:26<19:38:28, 12.50s/it] + +{'loss': 0.4871, 'learning_rate': 1.7914340855739004e-05, 'epoch': 0.23} + + 23%|██▎ | 1720/7378 [5:54:26<19:38:28, 12.50s/it] + 23%|██▎ | 1721/7378 [5:54:38<19:36:25, 12.48s/it] + +{'loss': 0.4991, 'learning_rate': 1.7911656592989697e-05, 'epoch': 0.23} + + 23%|██▎ | 1721/7378 [5:54:38<19:36:25, 12.48s/it] + 23%|██▎ | 1722/7378 [5:54:51<19:44:29, 12.57s/it] + +{'loss': 0.5231, 'learning_rate': 1.790897080539257e-05, 'epoch': 0.23} + + 23%|██▎ | 1722/7378 [5:54:51<19:44:29, 12.57s/it] + 23%|██▎ | 1723/7378 [5:55:03<19:45:57, 12.58s/it] + +{'loss': 0.4608, 'learning_rate': 1.790628349346527e-05, 'epoch': 0.23} + + 23%|██▎ | 1723/7378 [5:55:03<19:45:57, 12.58s/it] + 23%|██▎ | 1724/7378 [5:55:16<19:36:42, 12.49s/it] + +{'loss': 0.4801, 'learning_rate': 1.7903594657725728e-05, 'epoch': 0.23} + + 23%|██▎ | 1724/7378 [5:55:16<19:36:42, 12.49s/it] + 23%|██▎ | 1725/7378 [5:55:28<19:43:54, 12.57s/it] + +{'loss': 0.4914, 'learning_rate': 1.790090429869218e-05, 'epoch': 0.23} + + 23%|██▎ | 1725/7378 [5:55:28<19:43:54, 12.57s/it] + 23%|██▎ | 1726/7378 [5:55:41<19:56:44, 12.70s/it] + +{'loss': 0.5335, 'learning_rate': 1.789821241688315e-05, 'epoch': 0.23} + + 23%|██▎ | 1726/7378 [5:55:41<19:56:44, 12.70s/it] + 23%|██▎ | 1727/7378 [5:55:54<19:50:46, 12.64s/it] + +{'loss': 0.6049, 'learning_rate': 1.7895519012817452e-05, 'epoch': 0.23} + + 23%|██▎ | 1727/7378 [5:55:54<19:50:46, 12.64s/it] + 23%|██▎ | 1728/7378 [5:56:06<19:49:33, 12.63s/it] + +{'loss': 0.4969, 'learning_rate': 1.78928240870142e-05, 'epoch': 0.23} + + 23%|██▎ | 1728/7378 [5:56:06<19:49:33, 12.63s/it] + 23%|██▎ | 1729/7378 [5:56:19<19:38:47, 12.52s/it] + +{'loss': 0.4657, 'learning_rate': 1.7890127639992803e-05, 'epoch': 0.23} + + 23%|██▎ | 1729/7378 [5:56:19<19:38:47, 12.52s/it] + 23%|██▎ | 1730/7378 [5:56:31<19:24:10, 12.37s/it] + +{'loss': 0.4227, 'learning_rate': 1.7887429672272954e-05, 'epoch': 0.23} + + 23%|██▎ | 1730/7378 [5:56:31<19:24:10, 12.37s/it] + 23%|██▎ | 1731/7378 [5:56:43<19:19:11, 12.32s/it] + +{'loss': 0.5082, 'learning_rate': 1.7884730184374645e-05, 'epoch': 0.23} + + 23%|██▎ | 1731/7378 [5:56:43<19:19:11, 12.32s/it] + 23%|██▎ | 1732/7378 [5:56:55<19:17:14, 12.30s/it] + +{'loss': 0.4013, 'learning_rate': 1.7882029176818157e-05, 'epoch': 0.23} + + 23%|██▎ | 1732/7378 [5:56:55<19:17:14, 12.30s/it] + 23%|██▎ | 1733/7378 [5:57:07<19:08:35, 12.21s/it] + +{'loss': 0.4384, 'learning_rate': 1.787932665012407e-05, 'epoch': 0.23} + + 23%|██▎ | 1733/7378 [5:57:07<19:08:35, 12.21s/it] + 24%|██▎ | 1734/7378 [5:57:19<19:04:15, 12.16s/it] + +{'loss': 0.5212, 'learning_rate': 1.787662260481326e-05, 'epoch': 0.24} + + 24%|██▎ | 1734/7378 [5:57:19<19:04:15, 12.16s/it] + 24%|██▎ | 1735/7378 [5:57:32<19:06:42, 12.19s/it] + +{'loss': 0.4593, 'learning_rate': 1.7873917041406875e-05, 'epoch': 0.24} + + 24%|██▎ | 1735/7378 [5:57:32<19:06:42, 12.19s/it] + 24%|██▎ | 1736/7378 [5:57:44<19:07:43, 12.21s/it] + +{'loss': 0.4926, 'learning_rate': 1.7871209960426383e-05, 'epoch': 0.24} + + 24%|██▎ | 1736/7378 [5:57:44<19:07:43, 12.21s/it] + 24%|██▎ | 1737/7378 [5:57:56<19:16:08, 12.30s/it] + +{'loss': 0.5148, 'learning_rate': 1.7868501362393525e-05, 'epoch': 0.24} + + 24%|██▎ | 1737/7378 [5:57:56<19:16:08, 12.30s/it] + 24%|██▎ | 1738/7378 [5:58:09<19:23:23, 12.38s/it] + +{'loss': 0.4489, 'learning_rate': 1.7865791247830344e-05, 'epoch': 0.24} + + 24%|██▎ | 1738/7378 [5:58:09<19:23:23, 12.38s/it] + 24%|██▎ | 1739/7378 [5:58:21<19:21:04, 12.35s/it] + +{'loss': 0.4874, 'learning_rate': 1.7863079617259168e-05, 'epoch': 0.24} + + 24%|██▎ | 1739/7378 [5:58:21<19:21:04, 12.35s/it] + 24%|██▎ | 1740/7378 [5:58:33<19:20:08, 12.35s/it] + +{'loss': 0.5113, 'learning_rate': 1.7860366471202622e-05, 'epoch': 0.24} + + 24%|██▎ | 1740/7378 [5:58:33<19:20:08, 12.35s/it] + 24%|██▎ | 1741/7378 [5:58:46<19:24:37, 12.40s/it] + +{'loss': 0.5078, 'learning_rate': 1.785765181018363e-05, 'epoch': 0.24} + + 24%|██▎ | 1741/7378 [5:58:46<19:24:37, 12.40s/it] + 24%|██▎ | 1742/7378 [5:58:58<19:16:26, 12.31s/it] + +{'loss': 0.471, 'learning_rate': 1.785493563472539e-05, 'epoch': 0.24} + + 24%|██▎ | 1742/7378 [5:58:58<19:16:26, 12.31s/it] + 24%|██▎ | 1743/7378 [5:59:10<19:12:31, 12.27s/it] + +{'loss': 0.5122, 'learning_rate': 1.7852217945351404e-05, 'epoch': 0.24} + + 24%|██▎ | 1743/7378 [5:59:10<19:12:31, 12.27s/it] + 24%|██▎ | 1744/7378 [5:59:22<19:04:11, 12.19s/it] + +{'loss': 0.4419, 'learning_rate': 1.784949874258547e-05, 'epoch': 0.24} + + 24%|██▎ | 1744/7378 [5:59:22<19:04:11, 12.19s/it] + 24%|██▎ | 1745/7378 [5:59:35<19:06:17, 12.21s/it] + +{'loss': 0.4442, 'learning_rate': 1.7846778026951667e-05, 'epoch': 0.24} + + 24%|██▎ | 1745/7378 [5:59:35<19:06:17, 12.21s/it] + 24%|██▎ | 1746/7378 [5:59:47<19:04:57, 12.20s/it] + +{'loss': 0.5047, 'learning_rate': 1.7844055798974372e-05, 'epoch': 0.24} + + 24%|██▎ | 1746/7378 [5:59:47<19:04:57, 12.20s/it] + 24%|██▎ | 1747/7378 [5:59:59<18:56:20, 12.11s/it] + +{'loss': 0.5206, 'learning_rate': 1.7841332059178254e-05, 'epoch': 0.24} + + 24%|██▎ | 1747/7378 [5:59:59<18:56:20, 12.11s/it] + 24%|██▎ | 1748/7378 [6:00:11<19:00:36, 12.16s/it] + +{'loss': 0.4658, 'learning_rate': 1.7838606808088265e-05, 'epoch': 0.24} + + 24%|██▎ | 1748/7378 [6:00:11<19:00:36, 12.16s/it] + 24%|██▎ | 1749/7378 [6:00:23<19:09:14, 12.25s/it] + +{'loss': 0.4384, 'learning_rate': 1.783588004622966e-05, 'epoch': 0.24} + + 24%|██▎ | 1749/7378 [6:00:23<19:09:14, 12.25s/it] + 24%|██▎ | 1750/7378 [6:00:36<19:16:38, 12.33s/it] + +{'loss': 0.4695, 'learning_rate': 1.7833151774127978e-05, 'epoch': 0.24} + + 24%|██▎ | 1750/7378 [6:00:36<19:16:38, 12.33s/it] + 24%|██▎ | 1751/7378 [6:00:48<19:17:56, 12.35s/it] + +{'loss': 0.4746, 'learning_rate': 1.7830421992309047e-05, 'epoch': 0.24} + + 24%|██▎ | 1751/7378 [6:00:48<19:17:56, 12.35s/it] + 24%|██▎ | 1752/7378 [6:01:01<19:23:12, 12.41s/it] + +{'loss': 0.4654, 'learning_rate': 1.7827690701298995e-05, 'epoch': 0.24} + + 24%|██▎ | 1752/7378 [6:01:01<19:23:12, 12.41s/it] + 24%|██▍ | 1753/7378 [6:01:13<19:19:52, 12.37s/it] + +{'loss': 0.5062, 'learning_rate': 1.7824957901624236e-05, 'epoch': 0.24} + + 24%|██▍ | 1753/7378 [6:01:13<19:19:52, 12.37s/it] + 24%|██▍ | 1754/7378 [6:01:26<19:31:05, 12.49s/it] + +{'loss': 0.447, 'learning_rate': 1.7822223593811468e-05, 'epoch': 0.24} + + 24%|██▍ | 1754/7378 [6:01:26<19:31:05, 12.49s/it] + 24%|██▍ | 1755/7378 [6:01:38<19:19:05, 12.37s/it] + +{'loss': 0.4515, 'learning_rate': 1.781948777838769e-05, 'epoch': 0.24} + + 24%|██▍ | 1755/7378 [6:01:38<19:19:05, 12.37s/it] + 24%|██▍ | 1756/7378 [6:01:50<19:09:01, 12.26s/it] + +{'loss': 0.444, 'learning_rate': 1.781675045588019e-05, 'epoch': 0.24} + + 24%|██▍ | 1756/7378 [6:01:50<19:09:01, 12.26s/it] + 24%|██▍ | 1757/7378 [6:02:02<19:02:55, 12.20s/it] + +{'loss': 0.5015, 'learning_rate': 1.781401162681654e-05, 'epoch': 0.24} + + 24%|██▍ | 1757/7378 [6:02:02<19:02:55, 12.20s/it] + 24%|██▍ | 1758/7378 [6:02:15<19:12:41, 12.31s/it] + +{'loss': 0.463, 'learning_rate': 1.781127129172461e-05, 'epoch': 0.24} + + 24%|██▍ | 1758/7378 [6:02:15<19:12:41, 12.31s/it] + 24%|██▍ | 1759/7378 [6:02:31<21:15:31, 13.62s/it] + +{'loss': 0.4404, 'learning_rate': 1.780852945113255e-05, 'epoch': 0.24} + + 24%|██▍ | 1759/7378 [6:02:31<21:15:31, 13.62s/it] + 24%|██▍ | 1760/7378 [6:02:44<20:37:43, 13.22s/it] + +{'loss': 0.5301, 'learning_rate': 1.7805786105568813e-05, 'epoch': 0.24} + + 24%|██▍ | 1760/7378 [6:02:44<20:37:43, 13.22s/it] + 24%|██▍ | 1761/7378 [6:02:56<20:13:46, 12.97s/it] + +{'loss': 0.4346, 'learning_rate': 1.7803041255562137e-05, 'epoch': 0.24} + + 24%|██▍ | 1761/7378 [6:02:56<20:13:46, 12.97s/it] + 24%|██▍ | 1762/7378 [6:03:08<19:50:18, 12.72s/it] + +{'loss': 0.4309, 'learning_rate': 1.780029490164154e-05, 'epoch': 0.24} + + 24%|██▍ | 1762/7378 [6:03:08<19:50:18, 12.72s/it] + 24%|██▍ | 1763/7378 [6:03:21<19:43:50, 12.65s/it] + +{'loss': 0.4684, 'learning_rate': 1.779754704433635e-05, 'epoch': 0.24} + + 24%|██▍ | 1763/7378 [6:03:21<19:43:50, 12.65s/it] + 24%|██▍ | 1764/7378 [6:03:33<19:41:25, 12.63s/it] + +{'loss': 0.4521, 'learning_rate': 1.7794797684176165e-05, 'epoch': 0.24} + + 24%|██▍ | 1764/7378 [6:03:33<19:41:25, 12.63s/it] + 24%|██▍ | 1765/7378 [6:03:45<19:33:49, 12.55s/it] + +{'loss': 0.5144, 'learning_rate': 1.7792046821690885e-05, 'epoch': 0.24} + + 24%|██▍ | 1765/7378 [6:03:45<19:33:49, 12.55s/it] + 24%|██▍ | 1766/7378 [6:03:58<19:31:55, 12.53s/it] + +{'loss': 0.5256, 'learning_rate': 1.7789294457410693e-05, 'epoch': 0.24} + + 24%|██▍ | 1766/7378 [6:03:58<19:31:55, 12.53s/it] + 24%|██▍ | 1767/7378 [6:04:10<19:24:48, 12.46s/it] + +{'loss': 0.4484, 'learning_rate': 1.7786540591866067e-05, 'epoch': 0.24} + + 24%|██▍ | 1767/7378 [6:04:10<19:24:48, 12.46s/it] + 24%|██▍ | 1768/7378 [6:04:23<19:21:12, 12.42s/it] + +{'loss': 0.4173, 'learning_rate': 1.7783785225587774e-05, 'epoch': 0.24} + + 24%|██▍ | 1768/7378 [6:04:23<19:21:12, 12.42s/it] + 24%|██▍ | 1769/7378 [6:04:35<19:20:24, 12.41s/it] + +{'loss': 0.4612, 'learning_rate': 1.7781028359106856e-05, 'epoch': 0.24} + + 24%|██▍ | 1769/7378 [6:04:35<19:20:24, 12.41s/it] + 24%|██▍ | 1770/7378 [6:04:47<19:20:19, 12.41s/it] + +{'loss': 0.5847, 'learning_rate': 1.777826999295467e-05, 'epoch': 0.24} + + 24%|██▍ | 1770/7378 [6:04:47<19:20:19, 12.41s/it] + 24%|██▍ | 1771/7378 [6:04:59<19:10:44, 12.31s/it] + +{'loss': 0.4857, 'learning_rate': 1.777551012766284e-05, 'epoch': 0.24} + + 24%|██▍ | 1771/7378 [6:04:59<19:10:44, 12.31s/it] + 24%|██▍ | 1772/7378 [6:05:12<19:21:34, 12.43s/it] + +{'loss': 0.4691, 'learning_rate': 1.7772748763763288e-05, 'epoch': 0.24} + + 24%|██▍ | 1772/7378 [6:05:12<19:21:34, 12.43s/it] + 24%|██▍ | 1773/7378 [6:05:24<19:06:03, 12.27s/it] + +{'loss': 0.5149, 'learning_rate': 1.7769985901788223e-05, 'epoch': 0.24} + + 24%|██▍ | 1773/7378 [6:05:24<19:06:03, 12.27s/it] + 24%|██▍ | 1774/7378 [6:05:36<19:05:47, 12.27s/it] + +{'loss': 0.4862, 'learning_rate': 1.7767221542270146e-05, 'epoch': 0.24} + + 24%|██▍ | 1774/7378 [6:05:36<19:05:47, 12.27s/it] + 24%|██▍ | 1775/7378 [6:05:49<19:08:24, 12.30s/it] + +{'loss': 0.5084, 'learning_rate': 1.776445568574184e-05, 'epoch': 0.24} + + 24%|██▍ | 1775/7378 [6:05:49<19:08:24, 12.30s/it] + 24%|██▍ | 1776/7378 [6:06:01<19:08:24, 12.30s/it] + +{'loss': 0.4378, 'learning_rate': 1.7761688332736385e-05, 'epoch': 0.24} + + 24%|██▍ | 1776/7378 [6:06:01<19:08:24, 12.30s/it] + 24%|██▍ | 1777/7378 [6:06:14<19:24:11, 12.47s/it] + +{'loss': 0.475, 'learning_rate': 1.7758919483787146e-05, 'epoch': 0.24} + + 24%|██▍ | 1777/7378 [6:06:14<19:24:11, 12.47s/it] + 24%|██▍ | 1778/7378 [6:06:26<19:20:25, 12.43s/it] + +{'loss': 0.5432, 'learning_rate': 1.7756149139427764e-05, 'epoch': 0.24} + + 24%|██▍ | 1778/7378 [6:06:26<19:20:25, 12.43s/it] + 24%|██▍ | 1779/7378 [6:06:38<19:15:46, 12.39s/it] + +{'loss': 0.5313, 'learning_rate': 1.7753377300192196e-05, 'epoch': 0.24} + + 24%|██▍ | 1779/7378 [6:06:38<19:15:46, 12.39s/it] + 24%|██▍ | 1780/7378 [6:06:51<19:07:27, 12.30s/it] + +{'loss': 0.5085, 'learning_rate': 1.7750603966614654e-05, 'epoch': 0.24} + + 24%|██▍ | 1780/7378 [6:06:51<19:07:27, 12.30s/it] + 24%|██▍ | 1781/7378 [6:07:04<19:25:23, 12.49s/it] + +{'loss': 0.4608, 'learning_rate': 1.7747829139229664e-05, 'epoch': 0.24} + + 24%|██▍ | 1781/7378 [6:07:04<19:25:23, 12.49s/it] + 24%|██▍ | 1782/7378 [6:07:16<19:30:38, 12.55s/it] + +{'loss': 0.4078, 'learning_rate': 1.7745052818572033e-05, 'epoch': 0.24} + + 24%|██▍ | 1782/7378 [6:07:16<19:30:38, 12.55s/it] + 24%|██▍ | 1783/7378 [6:07:29<19:25:48, 12.50s/it] + +{'loss': 0.478, 'learning_rate': 1.7742275005176845e-05, 'epoch': 0.24} + + 24%|██▍ | 1783/7378 [6:07:29<19:25:48, 12.50s/it] + 24%|██▍ | 1784/7378 [6:07:41<19:25:53, 12.51s/it] + +{'loss': 0.4855, 'learning_rate': 1.7739495699579488e-05, 'epoch': 0.24} + + 24%|██▍ | 1784/7378 [6:07:41<19:25:53, 12.51s/it] + 24%|██▍ | 1785/7378 [6:07:53<19:19:03, 12.43s/it] + +{'loss': 0.5992, 'learning_rate': 1.7736714902315624e-05, 'epoch': 0.24} + + 24%|██▍ | 1785/7378 [6:07:53<19:19:03, 12.43s/it] + 24%|██▍ | 1786/7378 [6:08:05<19:03:22, 12.27s/it] + +{'loss': 0.448, 'learning_rate': 1.773393261392121e-05, 'epoch': 0.24} + + 24%|██▍ | 1786/7378 [6:08:05<19:03:22, 12.27s/it] + 24%|██▍ | 1787/7378 [6:08:17<19:01:02, 12.25s/it] + +{'loss': 0.514, 'learning_rate': 1.773114883493249e-05, 'epoch': 0.24} + + 24%|██▍ | 1787/7378 [6:08:17<19:01:02, 12.25s/it] + 24%|██▍ | 1788/7378 [6:08:29<18:55:20, 12.19s/it] + +{'loss': 0.4831, 'learning_rate': 1.772836356588599e-05, 'epoch': 0.24} + + 24%|██▍ | 1788/7378 [6:08:30<18:55:20, 12.19s/it] + 24%|██▍ | 1789/7378 [6:08:42<19:10:46, 12.35s/it] + +{'loss': 0.4774, 'learning_rate': 1.7725576807318533e-05, 'epoch': 0.24} + + 24%|██▍ | 1789/7378 [6:08:42<19:10:46, 12.35s/it] + 24%|██▍ | 1790/7378 [6:08:55<19:07:54, 12.33s/it] + +{'loss': 0.4477, 'learning_rate': 1.772278855976721e-05, 'epoch': 0.24} + + 24%|██▍ | 1790/7378 [6:08:55<19:07:54, 12.33s/it] + 24%|██▍ | 1791/7378 [6:09:07<19:07:10, 12.32s/it] + +{'loss': 0.5042, 'learning_rate': 1.7719998823769432e-05, 'epoch': 0.24} + + 24%|██▍ | 1791/7378 [6:09:07<19:07:10, 12.32s/it] + 24%|██▍ | 1792/7378 [6:09:19<19:02:09, 12.27s/it] + +{'loss': 0.5177, 'learning_rate': 1.771720759986286e-05, 'epoch': 0.24} + + 24%|██▍ | 1792/7378 [6:09:19<19:02:09, 12.27s/it] + 24%|██▍ | 1793/7378 [6:09:31<18:57:32, 12.22s/it] + +{'loss': 0.4579, 'learning_rate': 1.771441488858547e-05, 'epoch': 0.24} + + 24%|██▍ | 1793/7378 [6:09:31<18:57:32, 12.22s/it] + 24%|██▍ | 1794/7378 [6:09:43<18:48:36, 12.13s/it] + +{'loss': 0.4555, 'learning_rate': 1.7711620690475505e-05, 'epoch': 0.24} + + 24%|██▍ | 1794/7378 [6:09:43<18:48:36, 12.13s/it] + 24%|██▍ | 1795/7378 [6:09:55<18:55:29, 12.20s/it] + +{'loss': 0.5385, 'learning_rate': 1.7708825006071502e-05, 'epoch': 0.24} + + 24%|██▍ | 1795/7378 [6:09:55<18:55:29, 12.20s/it] + 24%|██▍ | 1796/7378 [6:10:07<18:52:16, 12.17s/it] + +{'loss': 0.4944, 'learning_rate': 1.7706027835912296e-05, 'epoch': 0.24} + + 24%|██▍ | 1796/7378 [6:10:07<18:52:16, 12.17s/it] + 24%|██▍ | 1797/7378 [6:10:20<18:53:58, 12.19s/it] + +{'loss': 0.5069, 'learning_rate': 1.7703229180536988e-05, 'epoch': 0.24} + + 24%|██▍ | 1797/7378 [6:10:20<18:53:58, 12.19s/it] + 24%|██▍ | 1798/7378 [6:10:32<19:03:22, 12.29s/it] + +{'loss': 0.4247, 'learning_rate': 1.770042904048498e-05, 'epoch': 0.24} + + 24%|██▍ | 1798/7378 [6:10:32<19:03:22, 12.29s/it] + 24%|██▍ | 1799/7378 [6:10:45<19:12:36, 12.40s/it] + +{'loss': 0.51, 'learning_rate': 1.7697627416295953e-05, 'epoch': 0.24} + + 24%|██▍ | 1799/7378 [6:10:45<19:12:36, 12.40s/it] + 24%|██▍ | 1800/7378 [6:10:57<19:08:51, 12.36s/it] + +{'loss': 0.4766, 'learning_rate': 1.7694824308509875e-05, 'epoch': 0.24} + + 24%|██▍ | 1800/7378 [6:10:57<19:08:51, 12.36s/it] + 24%|██▍ | 1801/7378 [6:11:09<19:03:08, 12.30s/it] + +{'loss': 0.5033, 'learning_rate': 1.7692019717667002e-05, 'epoch': 0.24} + + 24%|██▍ | 1801/7378 [6:11:09<19:03:08, 12.30s/it] + 24%|██▍ | 1802/7378 [6:11:22<19:02:18, 12.29s/it] + +{'loss': 0.5383, 'learning_rate': 1.7689213644307875e-05, 'epoch': 0.24} + + 24%|██▍ | 1802/7378 [6:11:22<19:02:18, 12.29s/it] + 24%|██▍ | 1803/7378 [6:11:34<18:59:02, 12.26s/it] + +{'loss': 0.4599, 'learning_rate': 1.7686406088973324e-05, 'epoch': 0.24} + + 24%|██▍ | 1803/7378 [6:11:34<18:59:02, 12.26s/it] + 24%|██▍ | 1804/7378 [6:11:46<19:00:20, 12.27s/it] + +{'loss': 0.4884, 'learning_rate': 1.7683597052204456e-05, 'epoch': 0.24} + + 24%|██▍ | 1804/7378 [6:11:46<19:00:20, 12.27s/it] + 24%|██▍ | 1805/7378 [6:11:58<19:02:02, 12.30s/it] + +{'loss': 0.4656, 'learning_rate': 1.7680786534542673e-05, 'epoch': 0.24} + + 24%|██▍ | 1805/7378 [6:11:58<19:02:02, 12.30s/it] + 24%|██▍ | 1806/7378 [6:12:11<19:03:21, 12.31s/it] + +{'loss': 0.5052, 'learning_rate': 1.7677974536529657e-05, 'epoch': 0.24} + + 24%|██▍ | 1806/7378 [6:12:11<19:03:21, 12.31s/it] + 24%|██▍ | 1807/7378 [6:12:23<19:12:49, 12.42s/it] + +{'loss': 0.4182, 'learning_rate': 1.7675161058707372e-05, 'epoch': 0.24} + + 24%|██▍ | 1807/7378 [6:12:23<19:12:49, 12.42s/it] + 25%|██▍ | 1808/7378 [6:12:36<19:09:04, 12.38s/it] + +{'loss': 0.4506, 'learning_rate': 1.767234610161808e-05, 'epoch': 0.25} + + 25%|██▍ | 1808/7378 [6:12:36<19:09:04, 12.38s/it] + 25%|██▍ | 1809/7378 [6:12:48<19:06:39, 12.35s/it] + +{'loss': 0.5211, 'learning_rate': 1.7669529665804312e-05, 'epoch': 0.25} + + 25%|██▍ | 1809/7378 [6:12:48<19:06:39, 12.35s/it] + 25%|██▍ | 1810/7378 [6:13:00<19:03:40, 12.32s/it] + +{'loss': 0.4964, 'learning_rate': 1.76667117518089e-05, 'epoch': 0.25} + + 25%|██▍ | 1810/7378 [6:13:00<19:03:40, 12.32s/it] + 25%|██▍ | 1811/7378 [6:13:13<19:02:13, 12.31s/it] + +{'loss': 0.4665, 'learning_rate': 1.7663892360174943e-05, 'epoch': 0.25} + + 25%|██▍ | 1811/7378 [6:13:13<19:02:13, 12.31s/it] + 25%|██▍ | 1812/7378 [6:13:25<19:12:39, 12.43s/it] + +{'loss': 0.4646, 'learning_rate': 1.7661071491445843e-05, 'epoch': 0.25} + + 25%|██▍ | 1812/7378 [6:13:25<19:12:39, 12.43s/it] + 25%|██▍ | 1813/7378 [6:13:38<19:14:17, 12.45s/it] + +{'loss': 0.5117, 'learning_rate': 1.7658249146165273e-05, 'epoch': 0.25} + + 25%|██▍ | 1813/7378 [6:13:38<19:14:17, 12.45s/it] + 25%|██▍ | 1814/7378 [6:13:50<19:10:55, 12.41s/it] + +{'loss': 0.4806, 'learning_rate': 1.76554253248772e-05, 'epoch': 0.25} + + 25%|██▍ | 1814/7378 [6:13:50<19:10:55, 12.41s/it] + 25%|██▍ | 1815/7378 [6:14:02<19:09:53, 12.40s/it] + +{'loss': 0.5001, 'learning_rate': 1.765260002812587e-05, 'epoch': 0.25} + + 25%|██▍ | 1815/7378 [6:14:02<19:09:53, 12.40s/it] + 25%|██▍ | 1816/7378 [6:14:15<19:02:37, 12.33s/it] + +{'loss': 0.4618, 'learning_rate': 1.7649773256455807e-05, 'epoch': 0.25} + + 25%|██▍ | 1816/7378 [6:14:15<19:02:37, 12.33s/it] + 25%|██▍ | 1817/7378 [6:14:27<19:01:26, 12.32s/it] + +{'loss': 0.4005, 'learning_rate': 1.7646945010411843e-05, 'epoch': 0.25} + + 25%|██▍ | 1817/7378 [6:14:27<19:01:26, 12.32s/it] + 25%|██▍ | 1818/7378 [6:14:39<18:57:42, 12.28s/it] + +{'loss': 0.4289, 'learning_rate': 1.764411529053906e-05, 'epoch': 0.25} + + 25%|██▍ | 1818/7378 [6:14:39<18:57:42, 12.28s/it] + 25%|██▍ | 1819/7378 [6:14:52<19:02:57, 12.34s/it] + +{'loss': 0.4375, 'learning_rate': 1.764128409738286e-05, 'epoch': 0.25} + + 25%|██▍ | 1819/7378 [6:14:52<19:02:57, 12.34s/it] + 25%|██▍ | 1820/7378 [6:15:04<19:15:56, 12.48s/it] + +{'loss': 0.3984, 'learning_rate': 1.7638451431488897e-05, 'epoch': 0.25} + + 25%|██▍ | 1820/7378 [6:15:04<19:15:56, 12.48s/it] + 25%|██▍ | 1821/7378 [6:15:17<19:13:08, 12.45s/it] + +{'loss': 0.4845, 'learning_rate': 1.7635617293403127e-05, 'epoch': 0.25} + + 25%|██▍ | 1821/7378 [6:15:17<19:13:08, 12.45s/it] + 25%|██▍ | 1822/7378 [6:15:29<19:03:52, 12.35s/it] + +{'loss': 0.4839, 'learning_rate': 1.7632781683671787e-05, 'epoch': 0.25} + + 25%|██▍ | 1822/7378 [6:15:29<19:03:52, 12.35s/it] + 25%|██▍ | 1823/7378 [6:15:41<18:55:37, 12.27s/it] + +{'loss': 0.4583, 'learning_rate': 1.7629944602841398e-05, 'epoch': 0.25} + + 25%|██▍ | 1823/7378 [6:15:41<18:55:37, 12.27s/it] + 25%|██▍ | 1824/7378 [6:15:53<18:50:17, 12.21s/it] + +{'loss': 0.4648, 'learning_rate': 1.762710605145876e-05, 'epoch': 0.25} + + 25%|██▍ | 1824/7378 [6:15:53<18:50:17, 12.21s/it] + 25%|██▍ | 1825/7378 [6:16:05<18:54:39, 12.26s/it] + +{'loss': 0.4998, 'learning_rate': 1.762426603007096e-05, 'epoch': 0.25} + + 25%|██▍ | 1825/7378 [6:16:05<18:54:39, 12.26s/it] + 25%|██▍ | 1826/7378 [6:16:17<18:49:46, 12.21s/it] + +{'loss': 0.47, 'learning_rate': 1.7621424539225368e-05, 'epoch': 0.25} + + 25%|██▍ | 1826/7378 [6:16:17<18:49:46, 12.21s/it] + 25%|██▍ | 1827/7378 [6:16:30<18:45:03, 12.16s/it] + +{'loss': 0.456, 'learning_rate': 1.7618581579469638e-05, 'epoch': 0.25} + + 25%|██▍ | 1827/7378 [6:16:30<18:45:03, 12.16s/it] + 25%|██▍ | 1828/7378 [6:16:42<18:44:01, 12.15s/it] + +{'loss': 0.4579, 'learning_rate': 1.7615737151351705e-05, 'epoch': 0.25} + + 25%|██▍ | 1828/7378 [6:16:42<18:44:01, 12.15s/it] + 25%|██▍ | 1829/7378 [6:16:54<18:54:20, 12.27s/it] + +{'loss': 0.4495, 'learning_rate': 1.7612891255419788e-05, 'epoch': 0.25} + + 25%|██▍ | 1829/7378 [6:16:54<18:54:20, 12.27s/it] + 25%|██▍ | 1830/7378 [6:17:07<19:03:08, 12.36s/it] + +{'loss': 0.4999, 'learning_rate': 1.7610043892222382e-05, 'epoch': 0.25} + + 25%|██▍ | 1830/7378 [6:17:07<19:03:08, 12.36s/it] + 25%|██▍ | 1831/7378 [6:17:19<19:03:11, 12.37s/it] + +{'loss': 0.4496, 'learning_rate': 1.7607195062308285e-05, 'epoch': 0.25} + + 25%|██▍ | 1831/7378 [6:17:19<19:03:11, 12.37s/it] + 25%|██▍ | 1832/7378 [6:17:31<19:02:11, 12.36s/it] + +{'loss': 0.4827, 'learning_rate': 1.7604344766226557e-05, 'epoch': 0.25} + + 25%|██▍ | 1832/7378 [6:17:31<19:02:11, 12.36s/it] + 25%|██▍ | 1833/7378 [6:17:44<19:06:13, 12.40s/it] + +{'loss': 0.5256, 'learning_rate': 1.7601493004526546e-05, 'epoch': 0.25} + + 25%|██▍ | 1833/7378 [6:17:44<19:06:13, 12.40s/it] + 25%|██▍ | 1834/7378 [6:17:56<19:05:47, 12.40s/it] + +{'loss': 0.41, 'learning_rate': 1.7598639777757888e-05, 'epoch': 0.25} + + 25%|██▍ | 1834/7378 [6:17:56<19:05:47, 12.40s/it] + 25%|██▍ | 1835/7378 [6:18:09<19:04:08, 12.38s/it] + +{'loss': 0.4678, 'learning_rate': 1.7595785086470494e-05, 'epoch': 0.25} + + 25%|██▍ | 1835/7378 [6:18:09<19:04:08, 12.38s/it] + 25%|██▍ | 1836/7378 [6:18:21<18:56:40, 12.31s/it] + +{'loss': 0.4862, 'learning_rate': 1.7592928931214567e-05, 'epoch': 0.25} + + 25%|██▍ | 1836/7378 [6:18:21<18:56:40, 12.31s/it] + 25%|██▍ | 1837/7378 [6:18:33<19:02:26, 12.37s/it] + +{'loss': 0.4865, 'learning_rate': 1.759007131254058e-05, 'epoch': 0.25} + + 25%|██▍ | 1837/7378 [6:18:33<19:02:26, 12.37s/it] + 25%|██▍ | 1838/7378 [6:18:46<19:04:07, 12.39s/it] + +{'loss': 0.3626, 'learning_rate': 1.7587212230999298e-05, 'epoch': 0.25} + + 25%|██▍ | 1838/7378 [6:18:46<19:04:07, 12.39s/it] + 25%|██▍ | 1839/7378 [6:18:58<19:03:32, 12.39s/it] + +{'loss': 0.454, 'learning_rate': 1.758435168714176e-05, 'epoch': 0.25} + + 25%|██▍ | 1839/7378 [6:18:58<19:03:32, 12.39s/it] + 25%|██▍ | 1840/7378 [6:19:10<19:00:20, 12.35s/it] + +{'loss': 0.4913, 'learning_rate': 1.75814896815193e-05, 'epoch': 0.25} + + 25%|██▍ | 1840/7378 [6:19:10<19:00:20, 12.35s/it] + 25%|██▍ | 1841/7378 [6:19:22<18:50:02, 12.25s/it] + +{'loss': 0.4826, 'learning_rate': 1.7578626214683515e-05, 'epoch': 0.25} + + 25%|██▍ | 1841/7378 [6:19:22<18:50:02, 12.25s/it] + 25%|██▍ | 1842/7378 [6:19:35<18:53:01, 12.28s/it] + +{'loss': 0.4617, 'learning_rate': 1.7575761287186296e-05, 'epoch': 0.25} + + 25%|██▍ | 1842/7378 [6:19:35<18:53:01, 12.28s/it] + 25%|██▍ | 1843/7378 [6:19:47<18:43:19, 12.18s/it] + +{'loss': 0.4029, 'learning_rate': 1.7572894899579815e-05, 'epoch': 0.25} + + 25%|██▍ | 1843/7378 [6:19:47<18:43:19, 12.18s/it] + 25%|██▍ | 1844/7378 [6:19:59<18:47:46, 12.23s/it] + +{'loss': 0.5465, 'learning_rate': 1.757002705241652e-05, 'epoch': 0.25} + + 25%|██▍ | 1844/7378 [6:19:59<18:47:46, 12.23s/it] + 25%|██▌ | 1845/7378 [6:20:11<18:49:20, 12.25s/it] + +{'loss': 0.46, 'learning_rate': 1.7567157746249148e-05, 'epoch': 0.25} + + 25%|██▌ | 1845/7378 [6:20:11<18:49:20, 12.25s/it] + 25%|██▌ | 1846/7378 [6:20:24<18:49:56, 12.26s/it] + +{'loss': 0.5009, 'learning_rate': 1.7564286981630713e-05, 'epoch': 0.25} + + 25%|██▌ | 1846/7378 [6:20:24<18:49:56, 12.26s/it] + 25%|██▌ | 1847/7378 [6:20:36<18:51:26, 12.27s/it] + +{'loss': 0.4853, 'learning_rate': 1.7561414759114504e-05, 'epoch': 0.25} + + 25%|██▌ | 1847/7378 [6:20:36<18:51:26, 12.27s/it] + 25%|██▌ | 1848/7378 [6:20:48<18:57:15, 12.34s/it] + +{'loss': 0.5519, 'learning_rate': 1.7558541079254098e-05, 'epoch': 0.25} + + 25%|██▌ | 1848/7378 [6:20:48<18:57:15, 12.34s/it] + 25%|██▌ | 1849/7378 [6:21:01<19:02:36, 12.40s/it] + +{'loss': 0.5294, 'learning_rate': 1.7555665942603363e-05, 'epoch': 0.25} + + 25%|██▌ | 1849/7378 [6:21:01<19:02:36, 12.40s/it] + 25%|██▌ | 1850/7378 [6:21:13<18:52:21, 12.29s/it] + +{'loss': 0.4745, 'learning_rate': 1.755278934971642e-05, 'epoch': 0.25} + + 25%|██▌ | 1850/7378 [6:21:13<18:52:21, 12.29s/it] + 25%|██▌ | 1851/7378 [6:21:25<18:45:31, 12.22s/it] + +{'loss': 0.4755, 'learning_rate': 1.7549911301147697e-05, 'epoch': 0.25} + + 25%|██▌ | 1851/7378 [6:21:25<18:45:31, 12.22s/it] + 25%|██▌ | 1852/7378 [6:21:38<18:54:11, 12.31s/it] + +{'loss': 0.4868, 'learning_rate': 1.754703179745189e-05, 'epoch': 0.25} + + 25%|██▌ | 1852/7378 [6:21:38<18:54:11, 12.31s/it] + 25%|██▌ | 1853/7378 [6:21:50<18:56:33, 12.34s/it] + +{'loss': 0.4496, 'learning_rate': 1.754415083918398e-05, 'epoch': 0.25} + + 25%|██▌ | 1853/7378 [6:21:50<18:56:33, 12.34s/it] + 25%|██▌ | 1854/7378 [6:22:03<19:05:57, 12.45s/it] + +{'loss': 0.4498, 'learning_rate': 1.7541268426899222e-05, 'epoch': 0.25} + + 25%|██▌ | 1854/7378 [6:22:03<19:05:57, 12.45s/it] + 25%|██▌ | 1855/7378 [6:22:15<19:11:27, 12.51s/it] + +{'loss': 0.5099, 'learning_rate': 1.7538384561153162e-05, 'epoch': 0.25} + + 25%|██▌ | 1855/7378 [6:22:15<19:11:27, 12.51s/it] + 25%|██▌ | 1856/7378 [6:22:28<19:05:35, 12.45s/it] + +{'loss': 0.4811, 'learning_rate': 1.753549924250162e-05, 'epoch': 0.25} + + 25%|██▌ | 1856/7378 [6:22:28<19:05:35, 12.45s/it] + 25%|██▌ | 1857/7378 [6:22:40<18:59:53, 12.39s/it] + +{'loss': 0.4327, 'learning_rate': 1.753261247150069e-05, 'epoch': 0.25} + + 25%|██▌ | 1857/7378 [6:22:40<18:59:53, 12.39s/it] + 25%|██▌ | 1858/7378 [6:22:53<19:07:40, 12.47s/it] + +{'loss': 0.507, 'learning_rate': 1.7529724248706754e-05, 'epoch': 0.25} + + 25%|██▌ | 1858/7378 [6:22:53<19:07:40, 12.47s/it] + 25%|██▌ | 1859/7378 [6:23:05<19:00:27, 12.40s/it] + +{'loss': 0.429, 'learning_rate': 1.7526834574676475e-05, 'epoch': 0.25} + + 25%|██▌ | 1859/7378 [6:23:05<19:00:27, 12.40s/it] + 25%|██▌ | 1860/7378 [6:23:17<19:00:12, 12.40s/it] + +{'loss': 0.4942, 'learning_rate': 1.7523943449966786e-05, 'epoch': 0.25} + + 25%|██▌ | 1860/7378 [6:23:17<19:00:12, 12.40s/it] + 25%|██▌ | 1861/7378 [6:23:29<18:53:42, 12.33s/it] + +{'loss': 0.4775, 'learning_rate': 1.7521050875134916e-05, 'epoch': 0.25} + + 25%|██▌ | 1861/7378 [6:23:29<18:53:42, 12.33s/it] + 25%|██▌ | 1862/7378 [6:23:42<18:55:25, 12.35s/it] + +{'loss': 0.4349, 'learning_rate': 1.751815685073835e-05, 'epoch': 0.25} + + 25%|██▌ | 1862/7378 [6:23:42<18:55:25, 12.35s/it] + 25%|██▌ | 1863/7378 [6:23:54<18:52:23, 12.32s/it] + +{'loss': 0.5423, 'learning_rate': 1.751526137733488e-05, 'epoch': 0.25} + + 25%|██▌ | 1863/7378 [6:23:54<18:52:23, 12.32s/it] + 25%|██▌ | 1864/7378 [6:24:06<18:50:36, 12.30s/it] + +{'loss': 0.503, 'learning_rate': 1.7512364455482552e-05, 'epoch': 0.25} + + 25%|██▌ | 1864/7378 [6:24:06<18:50:36, 12.30s/it] + 25%|██▌ | 1865/7378 [6:24:19<18:49:15, 12.29s/it] + +{'loss': 0.4693, 'learning_rate': 1.750946608573971e-05, 'epoch': 0.25} + + 25%|██▌ | 1865/7378 [6:24:19<18:49:15, 12.29s/it] + 25%|██▌ | 1866/7378 [6:24:31<18:40:25, 12.20s/it] + +{'loss': 0.4702, 'learning_rate': 1.7506566268664963e-05, 'epoch': 0.25} + + 25%|██▌ | 1866/7378 [6:24:31<18:40:25, 12.20s/it] + 25%|██▌ | 1867/7378 [6:24:43<18:48:33, 12.29s/it] + +{'loss': 0.465, 'learning_rate': 1.7503665004817213e-05, 'epoch': 0.25} + + 25%|██▌ | 1867/7378 [6:24:43<18:48:33, 12.29s/it] + 25%|██▌ | 1868/7378 [6:24:55<18:50:24, 12.31s/it] + +{'loss': 0.4648, 'learning_rate': 1.7500762294755624e-05, 'epoch': 0.25} + + 25%|██▌ | 1868/7378 [6:24:55<18:50:24, 12.31s/it] + 25%|██▌ | 1869/7378 [6:25:08<18:53:47, 12.35s/it] + +{'loss': 0.4, 'learning_rate': 1.7497858139039654e-05, 'epoch': 0.25} + + 25%|██▌ | 1869/7378 [6:25:08<18:53:47, 12.35s/it] + 25%|██▌ | 1870/7378 [6:25:20<18:46:43, 12.27s/it] + +{'loss': 0.5026, 'learning_rate': 1.7494952538229034e-05, 'epoch': 0.25} + + 25%|██▌ | 1870/7378 [6:25:20<18:46:43, 12.27s/it] + 25%|██▌ | 1871/7378 [6:25:33<18:58:03, 12.40s/it] + +{'loss': 0.4635, 'learning_rate': 1.7492045492883764e-05, 'epoch': 0.25} + + 25%|██▌ | 1871/7378 [6:25:33<18:58:03, 12.40s/it] + 25%|██▌ | 1872/7378 [6:25:45<19:00:24, 12.43s/it] + +{'loss': 0.4265, 'learning_rate': 1.7489137003564145e-05, 'epoch': 0.25} + + 25%|██▌ | 1872/7378 [6:25:45<19:00:24, 12.43s/it] + 25%|██▌ | 1873/7378 [6:25:57<18:54:30, 12.37s/it] + +{'loss': 0.5725, 'learning_rate': 1.7486227070830734e-05, 'epoch': 0.25} + + 25%|██▌ | 1873/7378 [6:25:57<18:54:30, 12.37s/it] + 25%|██▌ | 1874/7378 [6:26:09<18:46:54, 12.28s/it] + +{'loss': 0.5489, 'learning_rate': 1.748331569524438e-05, 'epoch': 0.25} + + 25%|██▌ | 1874/7378 [6:26:09<18:46:54, 12.28s/it] + 25%|██▌ | 1875/7378 [6:26:22<18:51:00, 12.33s/it] + +{'loss': 0.4697, 'learning_rate': 1.7480402877366195e-05, 'epoch': 0.25} + + 25%|██▌ | 1875/7378 [6:26:22<18:51:00, 12.33s/it] + 25%|██▌ | 1876/7378 [6:26:34<18:43:47, 12.26s/it] + +{'loss': 0.5055, 'learning_rate': 1.747748861775759e-05, 'epoch': 0.25} + + 25%|██▌ | 1876/7378 [6:26:34<18:43:47, 12.26s/it] + 25%|██▌ | 1877/7378 [6:26:47<19:03:22, 12.47s/it] + +{'loss': 0.5105, 'learning_rate': 1.747457291698024e-05, 'epoch': 0.25} + + 25%|██▌ | 1877/7378 [6:26:47<19:03:22, 12.47s/it] + 25%|██▌ | 1878/7378 [6:27:00<19:07:34, 12.52s/it] + +{'loss': 0.462, 'learning_rate': 1.7471655775596097e-05, 'epoch': 0.25} + + 25%|██▌ | 1878/7378 [6:27:00<19:07:34, 12.52s/it] + 25%|██▌ | 1879/7378 [6:27:12<19:08:27, 12.53s/it] + +{'loss': 0.55, 'learning_rate': 1.7468737194167394e-05, 'epoch': 0.25} + + 25%|██▌ | 1879/7378 [6:27:12<19:08:27, 12.53s/it] + 25%|██▌ | 1880/7378 [6:27:24<19:03:31, 12.48s/it] + +{'loss': 0.4414, 'learning_rate': 1.746581717325665e-05, 'epoch': 0.25} + + 25%|██▌ | 1880/7378 [6:27:25<19:03:31, 12.48s/it] + 25%|██▌ | 1881/7378 [6:27:37<19:00:12, 12.45s/it] + +{'loss': 0.5229, 'learning_rate': 1.7462895713426647e-05, 'epoch': 0.25} + + 25%|██▌ | 1881/7378 [6:27:37<19:00:12, 12.45s/it] + 26%|██▌ | 1882/7378 [6:27:49<19:01:47, 12.46s/it] + +{'loss': 0.5323, 'learning_rate': 1.7459972815240452e-05, 'epoch': 0.26} + + 26%|██▌ | 1882/7378 [6:27:49<19:01:47, 12.46s/it] + 26%|██▌ | 1883/7378 [6:28:02<19:01:59, 12.47s/it] + +{'loss': 0.4861, 'learning_rate': 1.7457048479261406e-05, 'epoch': 0.26} + + 26%|██▌ | 1883/7378 [6:28:02<19:01:59, 12.47s/it] + 26%|██▌ | 1884/7378 [6:28:14<18:59:00, 12.44s/it] + +{'loss': 0.5024, 'learning_rate': 1.745412270605313e-05, 'epoch': 0.26} + + 26%|██▌ | 1884/7378 [6:28:14<18:59:00, 12.44s/it] + 26%|██▌ | 1885/7378 [6:28:26<18:41:47, 12.25s/it] + +{'loss': 0.5524, 'learning_rate': 1.745119549617952e-05, 'epoch': 0.26} + + 26%|██▌ | 1885/7378 [6:28:26<18:41:47, 12.25s/it] + 26%|██▌ | 1886/7378 [6:28:39<18:49:16, 12.34s/it] + +{'loss': 0.4624, 'learning_rate': 1.7448266850204754e-05, 'epoch': 0.26} + + 26%|██▌ | 1886/7378 [6:28:39<18:49:16, 12.34s/it] + 26%|██▌ | 1887/7378 [6:28:51<18:57:51, 12.43s/it] + +{'loss': 0.4011, 'learning_rate': 1.7445336768693274e-05, 'epoch': 0.26} + + 26%|██▌ | 1887/7378 [6:28:51<18:57:51, 12.43s/it] + 26%|██▌ | 1888/7378 [6:29:04<19:00:19, 12.46s/it] + +{'loss': 0.492, 'learning_rate': 1.744240525220982e-05, 'epoch': 0.26} + + 26%|██▌ | 1888/7378 [6:29:04<19:00:19, 12.46s/it] + 26%|██▌ | 1889/7378 [6:29:17<19:10:27, 12.58s/it] + +{'loss': 0.4561, 'learning_rate': 1.7439472301319385e-05, 'epoch': 0.26} + + 26%|██▌ | 1889/7378 [6:29:17<19:10:27, 12.58s/it] + 26%|██▌ | 1890/7378 [6:29:29<19:07:47, 12.55s/it] + +{'loss': 0.5164, 'learning_rate': 1.7436537916587254e-05, 'epoch': 0.26} + + 26%|██▌ | 1890/7378 [6:29:29<19:07:47, 12.55s/it] + 26%|██▌ | 1891/7378 [6:29:41<18:48:34, 12.34s/it] + +{'loss': 0.4332, 'learning_rate': 1.7433602098578983e-05, 'epoch': 0.26} + + 26%|██▌ | 1891/7378 [6:29:41<18:48:34, 12.34s/it] + 26%|██▌ | 1892/7378 [6:29:53<18:39:52, 12.25s/it] + +{'loss': 0.446, 'learning_rate': 1.74306648478604e-05, 'epoch': 0.26} + + 26%|██▌ | 1892/7378 [6:29:53<18:39:52, 12.25s/it] + 26%|██▌ | 1893/7378 [6:30:05<18:43:12, 12.29s/it] + +{'loss': 0.4936, 'learning_rate': 1.7427726164997624e-05, 'epoch': 0.26} + + 26%|██▌ | 1893/7378 [6:30:05<18:43:12, 12.29s/it] + 26%|██▌ | 1894/7378 [6:30:18<18:56:09, 12.43s/it] + +{'loss': 0.5262, 'learning_rate': 1.7424786050557036e-05, 'epoch': 0.26} + + 26%|██▌ | 1894/7378 [6:30:18<18:56:09, 12.43s/it] + 26%|██▌ | 1895/7378 [6:30:31<18:57:17, 12.45s/it] + +{'loss': 0.5551, 'learning_rate': 1.7421844505105293e-05, 'epoch': 0.26} + + 26%|██▌ | 1895/7378 [6:30:31<18:57:17, 12.45s/it] + 26%|██▌ | 1896/7378 [6:30:43<18:45:21, 12.32s/it] + +{'loss': 0.4537, 'learning_rate': 1.7418901529209336e-05, 'epoch': 0.26} + + 26%|██▌ | 1896/7378 [6:30:43<18:45:21, 12.32s/it] + 26%|██▌ | 1897/7378 [6:30:55<18:51:10, 12.38s/it] + +{'loss': 0.4935, 'learning_rate': 1.7415957123436373e-05, 'epoch': 0.26} + + 26%|██▌ | 1897/7378 [6:30:55<18:51:10, 12.38s/it] + 26%|██▌ | 1898/7378 [6:31:07<18:47:00, 12.34s/it] + +{'loss': 0.5212, 'learning_rate': 1.7413011288353896e-05, 'epoch': 0.26} + + 26%|██▌ | 1898/7378 [6:31:07<18:47:00, 12.34s/it] + 26%|██▌ | 1899/7378 [6:31:20<18:42:22, 12.29s/it] + +{'loss': 0.4347, 'learning_rate': 1.7410064024529667e-05, 'epoch': 0.26} + + 26%|██▌ | 1899/7378 [6:31:20<18:42:22, 12.29s/it] + 26%|██▌ | 1900/7378 [6:31:32<18:50:21, 12.38s/it] + +{'loss': 0.5089, 'learning_rate': 1.740711533253173e-05, 'epoch': 0.26} + + 26%|██▌ | 1900/7378 [6:31:32<18:50:21, 12.38s/it] + 26%|██▌ | 1901/7378 [6:31:45<18:54:41, 12.43s/it] + +{'loss': 0.4607, 'learning_rate': 1.740416521292839e-05, 'epoch': 0.26} + + 26%|██▌ | 1901/7378 [6:31:45<18:54:41, 12.43s/it] + 26%|██▌ | 1902/7378 [6:31:57<18:49:18, 12.37s/it] + +{'loss': 0.464, 'learning_rate': 1.740121366628824e-05, 'epoch': 0.26} + + 26%|██▌ | 1902/7378 [6:31:57<18:49:18, 12.37s/it] + 26%|██▌ | 1903/7378 [6:32:10<18:57:40, 12.47s/it] + +{'loss': 0.477, 'learning_rate': 1.7398260693180152e-05, 'epoch': 0.26} + + 26%|██▌ | 1903/7378 [6:32:10<18:57:40, 12.47s/it] + 26%|██▌ | 1904/7378 [6:32:22<18:54:48, 12.44s/it] + +{'loss': 0.421, 'learning_rate': 1.7395306294173254e-05, 'epoch': 0.26} + + 26%|██▌ | 1904/7378 [6:32:22<18:54:48, 12.44s/it] + 26%|██▌ | 1905/7378 [6:32:34<18:47:31, 12.36s/it] + +{'loss': 0.4612, 'learning_rate': 1.7392350469836965e-05, 'epoch': 0.26} + + 26%|██▌ | 1905/7378 [6:32:34<18:47:31, 12.36s/it] + 26%|██▌ | 1906/7378 [6:32:47<18:47:13, 12.36s/it] + +{'loss': 0.5203, 'learning_rate': 1.7389393220740975e-05, 'epoch': 0.26} + + 26%|██▌ | 1906/7378 [6:32:47<18:47:13, 12.36s/it] + 26%|██▌ | 1907/7378 [6:32:59<18:43:37, 12.32s/it] + +{'loss': 0.5036, 'learning_rate': 1.7386434547455246e-05, 'epoch': 0.26} + + 26%|██▌ | 1907/7378 [6:32:59<18:43:37, 12.32s/it] + 26%|██▌ | 1908/7378 [6:33:12<19:00:26, 12.51s/it] + +{'loss': 0.4904, 'learning_rate': 1.7383474450550014e-05, 'epoch': 0.26} + + 26%|██▌ | 1908/7378 [6:33:12<19:00:26, 12.51s/it] + 26%|██▌ | 1909/7378 [6:33:24<18:47:02, 12.36s/it] + +{'loss': 0.5153, 'learning_rate': 1.7380512930595794e-05, 'epoch': 0.26} + + 26%|██▌ | 1909/7378 [6:33:24<18:47:02, 12.36s/it] + 26%|██▌ | 1910/7378 [6:33:36<18:45:15, 12.35s/it] + +{'loss': 0.5521, 'learning_rate': 1.7377549988163373e-05, 'epoch': 0.26} + + 26%|██▌ | 1910/7378 [6:33:36<18:45:15, 12.35s/it] + 26%|██▌ | 1911/7378 [6:33:48<18:40:15, 12.29s/it] + +{'loss': 0.4635, 'learning_rate': 1.7374585623823808e-05, 'epoch': 0.26} + + 26%|██▌ | 1911/7378 [6:33:48<18:40:15, 12.29s/it] + 26%|██▌ | 1912/7378 [6:34:00<18:39:25, 12.29s/it] + +{'loss': 0.4504, 'learning_rate': 1.7371619838148436e-05, 'epoch': 0.26} + + 26%|██▌ | 1912/7378 [6:34:01<18:39:25, 12.29s/it] + 26%|██▌ | 1913/7378 [6:34:12<18:30:52, 12.20s/it] + +{'loss': 0.4936, 'learning_rate': 1.736865263170887e-05, 'epoch': 0.26} + + 26%|██▌ | 1913/7378 [6:34:12<18:30:52, 12.20s/it] + 26%|██▌ | 1914/7378 [6:34:25<18:39:20, 12.29s/it] + +{'loss': 0.5339, 'learning_rate': 1.7365684005076985e-05, 'epoch': 0.26} + + 26%|██▌ | 1914/7378 [6:34:25<18:39:20, 12.29s/it] + 26%|██▌ | 1915/7378 [6:34:37<18:40:25, 12.31s/it] + +{'loss': 0.5042, 'learning_rate': 1.7362713958824943e-05, 'epoch': 0.26} + + 26%|██▌ | 1915/7378 [6:34:37<18:40:25, 12.31s/it] + 26%|██▌ | 1916/7378 [6:34:50<18:46:06, 12.37s/it] + +{'loss': 0.4955, 'learning_rate': 1.735974249352517e-05, 'epoch': 0.26} + + 26%|██▌ | 1916/7378 [6:34:50<18:46:06, 12.37s/it] + 26%|██▌ | 1917/7378 [6:35:02<18:44:22, 12.35s/it] + +{'loss': 0.4842, 'learning_rate': 1.7356769609750374e-05, 'epoch': 0.26} + + 26%|██▌ | 1917/7378 [6:35:02<18:44:22, 12.35s/it] + 26%|██▌ | 1918/7378 [6:35:14<18:37:10, 12.28s/it] + +{'loss': 0.4595, 'learning_rate': 1.7353795308073526e-05, 'epoch': 0.26} + + 26%|██▌ | 1918/7378 [6:35:14<18:37:10, 12.28s/it] + 26%|██▌ | 1919/7378 [6:35:27<18:38:00, 12.29s/it] + +{'loss': 0.4904, 'learning_rate': 1.735081958906788e-05, 'epoch': 0.26} + + 26%|██▌ | 1919/7378 [6:35:27<18:38:00, 12.29s/it] + 26%|██▌ | 1920/7378 [6:35:39<18:37:58, 12.29s/it] + +{'loss': 0.4899, 'learning_rate': 1.7347842453306953e-05, 'epoch': 0.26} + + 26%|██▌ | 1920/7378 [6:35:39<18:37:58, 12.29s/it] + 26%|██▌ | 1921/7378 [6:35:51<18:37:53, 12.29s/it] + +{'loss': 0.476, 'learning_rate': 1.7344863901364554e-05, 'epoch': 0.26} + + 26%|██▌ | 1921/7378 [6:35:51<18:37:53, 12.29s/it] + 26%|██▌ | 1922/7378 [6:36:04<18:57:17, 12.51s/it] + +{'loss': 0.5134, 'learning_rate': 1.734188393381474e-05, 'epoch': 0.26} + + 26%|██▌ | 1922/7378 [6:36:04<18:57:17, 12.51s/it] + 26%|██▌ | 1923/7378 [6:36:16<18:47:49, 12.41s/it] + +{'loss': 0.5135, 'learning_rate': 1.733890255123186e-05, 'epoch': 0.26} + + 26%|██▌ | 1923/7378 [6:36:16<18:47:49, 12.41s/it] + 26%|██▌ | 1924/7378 [6:36:29<18:45:57, 12.39s/it] + +{'loss': 0.4438, 'learning_rate': 1.7335919754190523e-05, 'epoch': 0.26} + + 26%|██▌ | 1924/7378 [6:36:29<18:45:57, 12.39s/it] + 26%|██▌ | 1925/7378 [6:36:41<18:47:36, 12.41s/it] + +{'loss': 0.4771, 'learning_rate': 1.7332935543265625e-05, 'epoch': 0.26} + + 26%|██▌ | 1925/7378 [6:36:41<18:47:36, 12.41s/it] + 26%|██▌ | 1926/7378 [6:36:54<18:49:08, 12.43s/it] + +{'loss': 0.4702, 'learning_rate': 1.7329949919032315e-05, 'epoch': 0.26} + + 26%|██▌ | 1926/7378 [6:36:54<18:49:08, 12.43s/it] + 26%|██▌ | 1927/7378 [6:37:06<18:57:23, 12.52s/it] + +{'loss': 0.5359, 'learning_rate': 1.732696288206603e-05, 'epoch': 0.26} + + 26%|██▌ | 1927/7378 [6:37:06<18:57:23, 12.52s/it] + 26%|██▌ | 1928/7378 [6:37:19<18:48:46, 12.43s/it] + +{'loss': 0.4617, 'learning_rate': 1.7323974432942478e-05, 'epoch': 0.26} + + 26%|██▌ | 1928/7378 [6:37:19<18:48:46, 12.43s/it] + 26%|██▌ | 1929/7378 [6:37:31<18:36:25, 12.29s/it] + +{'loss': 0.522, 'learning_rate': 1.7320984572237636e-05, 'epoch': 0.26} + + 26%|██▌ | 1929/7378 [6:37:31<18:36:25, 12.29s/it] + 26%|██▌ | 1930/7378 [6:37:43<18:42:59, 12.37s/it] + +{'loss': 0.506, 'learning_rate': 1.7317993300527747e-05, 'epoch': 0.26} + + 26%|██▌ | 1930/7378 [6:37:43<18:42:59, 12.37s/it] + 26%|██▌ | 1931/7378 [6:37:55<18:39:19, 12.33s/it] + +{'loss': 0.4778, 'learning_rate': 1.7315000618389335e-05, 'epoch': 0.26} + + 26%|██▌ | 1931/7378 [6:37:55<18:39:19, 12.33s/it] + 26%|██▌ | 1932/7378 [6:38:07<18:33:25, 12.27s/it] + +{'loss': 0.4739, 'learning_rate': 1.7312006526399192e-05, 'epoch': 0.26} + + 26%|██▌ | 1932/7378 [6:38:07<18:33:25, 12.27s/it] + 26%|██▌ | 1933/7378 [6:38:20<18:36:22, 12.30s/it] + +{'loss': 0.4627, 'learning_rate': 1.7309011025134385e-05, 'epoch': 0.26} + + 26%|██▌ | 1933/7378 [6:38:20<18:36:22, 12.30s/it] + 26%|██▌ | 1934/7378 [6:38:32<18:36:28, 12.31s/it] + +{'loss': 0.4975, 'learning_rate': 1.7306014115172244e-05, 'epoch': 0.26} + + 26%|██▌ | 1934/7378 [6:38:32<18:36:28, 12.31s/it] + 26%|██▌ | 1935/7378 [6:38:45<18:41:13, 12.36s/it] + +{'loss': 0.4351, 'learning_rate': 1.730301579709038e-05, 'epoch': 0.26} + + 26%|██▌ | 1935/7378 [6:38:45<18:41:13, 12.36s/it] + 26%|██▌ | 1936/7378 [6:38:57<18:50:20, 12.46s/it] + +{'loss': 0.568, 'learning_rate': 1.7300016071466674e-05, 'epoch': 0.26} + + 26%|██▌ | 1936/7378 [6:38:57<18:50:20, 12.46s/it] + 26%|██▋ | 1937/7378 [6:39:10<18:47:55, 12.44s/it] + +{'loss': 0.5103, 'learning_rate': 1.7297014938879276e-05, 'epoch': 0.26} + + 26%|██▋ | 1937/7378 [6:39:10<18:47:55, 12.44s/it] + 26%|██▋ | 1938/7378 [6:39:22<18:46:21, 12.42s/it] + +{'loss': 0.5312, 'learning_rate': 1.7294012399906603e-05, 'epoch': 0.26} + + 26%|██▋ | 1938/7378 [6:39:22<18:46:21, 12.42s/it] + 26%|██▋ | 1939/7378 [6:39:34<18:35:44, 12.31s/it] + +{'loss': 0.4934, 'learning_rate': 1.7291008455127346e-05, 'epoch': 0.26} + + 26%|██▋ | 1939/7378 [6:39:34<18:35:44, 12.31s/it] + 26%|██▋ | 1940/7378 [6:39:46<18:32:33, 12.28s/it] + +{'loss': 0.5495, 'learning_rate': 1.7288003105120474e-05, 'epoch': 0.26} + + 26%|██▋ | 1940/7378 [6:39:46<18:32:33, 12.28s/it] + 26%|██▋ | 1941/7378 [6:39:59<18:43:59, 12.40s/it] + +{'loss': 0.4251, 'learning_rate': 1.728499635046522e-05, 'epoch': 0.26} + + 26%|██▋ | 1941/7378 [6:39:59<18:43:59, 12.40s/it] + 26%|██▋ | 1942/7378 [6:40:11<18:44:19, 12.41s/it] + +{'loss': 0.5064, 'learning_rate': 1.7281988191741085e-05, 'epoch': 0.26} + + 26%|██▋ | 1942/7378 [6:40:11<18:44:19, 12.41s/it] + 26%|██▋ | 1943/7378 [6:40:24<18:50:48, 12.48s/it] + +{'loss': 0.5174, 'learning_rate': 1.727897862952785e-05, 'epoch': 0.26} + + 26%|██▋ | 1943/7378 [6:40:24<18:50:48, 12.48s/it] + 26%|██▋ | 1944/7378 [6:40:37<18:49:11, 12.47s/it] + +{'loss': 0.4434, 'learning_rate': 1.7275967664405558e-05, 'epoch': 0.26} + + 26%|██▋ | 1944/7378 [6:40:37<18:49:11, 12.47s/it] + 26%|██▋ | 1945/7378 [6:40:49<18:45:12, 12.43s/it] + +{'loss': 0.5292, 'learning_rate': 1.7272955296954524e-05, 'epoch': 0.26} + + 26%|██▋ | 1945/7378 [6:40:49<18:45:12, 12.43s/it] + 26%|██▋ | 1946/7378 [6:41:01<18:37:24, 12.34s/it] + +{'loss': 0.4929, 'learning_rate': 1.7269941527755337e-05, 'epoch': 0.26} + + 26%|██▋ | 1946/7378 [6:41:01<18:37:24, 12.34s/it] + 26%|██▋ | 1947/7378 [6:41:14<18:48:08, 12.46s/it] + +{'loss': 0.4677, 'learning_rate': 1.7266926357388852e-05, 'epoch': 0.26} + + 26%|██▋ | 1947/7378 [6:41:14<18:48:08, 12.46s/it] + 26%|██▋ | 1948/7378 [6:41:26<18:47:13, 12.46s/it] + +{'loss': 0.5477, 'learning_rate': 1.7263909786436194e-05, 'epoch': 0.26} + + 26%|██▋ | 1948/7378 [6:41:26<18:47:13, 12.46s/it] + 26%|██▋ | 1949/7378 [6:41:39<18:50:06, 12.49s/it] + +{'loss': 0.5, 'learning_rate': 1.726089181547877e-05, 'epoch': 0.26} + + 26%|██▋ | 1949/7378 [6:41:39<18:50:06, 12.49s/it] + 26%|██▋ | 1950/7378 [6:41:51<18:53:28, 12.53s/it] + +{'loss': 0.5263, 'learning_rate': 1.7257872445098232e-05, 'epoch': 0.26} + + 26%|██▋ | 1950/7378 [6:41:51<18:53:28, 12.53s/it] + 26%|██▋ | 1951/7378 [6:42:04<18:52:59, 12.53s/it] + +{'loss': 0.4907, 'learning_rate': 1.7254851675876526e-05, 'epoch': 0.26} + + 26%|██▋ | 1951/7378 [6:42:04<18:52:59, 12.53s/it] + 26%|██▋ | 1952/7378 [6:42:16<18:44:30, 12.43s/it] + +{'loss': 0.4905, 'learning_rate': 1.7251829508395855e-05, 'epoch': 0.26} + + 26%|██▋ | 1952/7378 [6:42:16<18:44:30, 12.43s/it] + 26%|██▋ | 1953/7378 [6:42:28<18:39:41, 12.38s/it] + +{'loss': 0.5035, 'learning_rate': 1.7248805943238696e-05, 'epoch': 0.26} + + 26%|██▋ | 1953/7378 [6:42:28<18:39:41, 12.38s/it] + 26%|██▋ | 1954/7378 [6:42:41<18:34:25, 12.33s/it] + +{'loss': 0.5185, 'learning_rate': 1.724578098098779e-05, 'epoch': 0.26} + + 26%|██▋ | 1954/7378 [6:42:41<18:34:25, 12.33s/it] + 26%|██▋ | 1955/7378 [6:42:53<18:30:36, 12.29s/it] + +{'loss': 0.4846, 'learning_rate': 1.7242754622226156e-05, 'epoch': 0.26} + + 26%|██▋ | 1955/7378 [6:42:53<18:30:36, 12.29s/it] + 27%|██▋ | 1956/7378 [6:43:05<18:35:47, 12.35s/it] + +{'loss': 0.5149, 'learning_rate': 1.7239726867537072e-05, 'epoch': 0.27} + + 27%|██▋ | 1956/7378 [6:43:05<18:35:47, 12.35s/it] + 27%|██▋ | 1957/7378 [6:43:18<18:41:48, 12.42s/it] + +{'loss': 0.5196, 'learning_rate': 1.7236697717504095e-05, 'epoch': 0.27} + + 27%|██▋ | 1957/7378 [6:43:18<18:41:48, 12.42s/it] + 27%|██▋ | 1958/7378 [6:43:30<18:44:09, 12.44s/it] + +{'loss': 0.4195, 'learning_rate': 1.7233667172711045e-05, 'epoch': 0.27} + + 27%|██▋ | 1958/7378 [6:43:30<18:44:09, 12.44s/it] + 27%|██▋ | 1959/7378 [6:43:43<18:37:01, 12.37s/it] + +{'loss': 0.5215, 'learning_rate': 1.723063523374201e-05, 'epoch': 0.27} + + 27%|██▋ | 1959/7378 [6:43:43<18:37:01, 12.37s/it] + 27%|██▋ | 1960/7378 [6:43:55<18:36:58, 12.37s/it] + +{'loss': 0.4795, 'learning_rate': 1.722760190118135e-05, 'epoch': 0.27} + + 27%|██▋ | 1960/7378 [6:43:55<18:36:58, 12.37s/it] + 27%|██▋ | 1961/7378 [6:44:08<18:43:25, 12.44s/it] + +{'loss': 0.4709, 'learning_rate': 1.7224567175613692e-05, 'epoch': 0.27} + + 27%|██▋ | 1961/7378 [6:44:08<18:43:25, 12.44s/it] + 27%|██▋ | 1962/7378 [6:44:20<18:49:22, 12.51s/it] + +{'loss': 0.4268, 'learning_rate': 1.7221531057623935e-05, 'epoch': 0.27} + + 27%|██▋ | 1962/7378 [6:44:20<18:49:22, 12.51s/it] + 27%|██▋ | 1963/7378 [6:44:33<19:07:53, 12.72s/it] + +{'loss': 0.4985, 'learning_rate': 1.7218493547797236e-05, 'epoch': 0.27} + + 27%|██▋ | 1963/7378 [6:44:33<19:07:53, 12.72s/it] + 27%|██▋ | 1964/7378 [6:44:46<18:57:29, 12.61s/it] + +{'loss': 0.4523, 'learning_rate': 1.7215454646719036e-05, 'epoch': 0.27} + + 27%|██▋ | 1964/7378 [6:44:46<18:57:29, 12.61s/it] + 27%|██▋ | 1965/7378 [6:44:58<18:46:21, 12.48s/it] + +{'loss': 0.4423, 'learning_rate': 1.721241435497503e-05, 'epoch': 0.27} + + 27%|██▋ | 1965/7378 [6:44:58<18:46:21, 12.48s/it] + 27%|██▋ | 1966/7378 [6:45:10<18:45:53, 12.48s/it] + +{'loss': 0.4463, 'learning_rate': 1.7209372673151186e-05, 'epoch': 0.27} + + 27%|██▋ | 1966/7378 [6:45:10<18:45:53, 12.48s/it] + 27%|██▋ | 1967/7378 [6:45:23<18:59:21, 12.63s/it] + +{'loss': 0.5125, 'learning_rate': 1.7206329601833746e-05, 'epoch': 0.27} + + 27%|██▋ | 1967/7378 [6:45:23<18:59:21, 12.63s/it] + 27%|██▋ | 1968/7378 [6:45:36<18:46:12, 12.49s/it] + +{'loss': 0.4503, 'learning_rate': 1.7203285141609205e-05, 'epoch': 0.27} + + 27%|██▋ | 1968/7378 [6:45:36<18:46:12, 12.49s/it] + 27%|██▋ | 1969/7378 [6:45:48<18:47:05, 12.50s/it] + +{'loss': 0.5237, 'learning_rate': 1.7200239293064345e-05, 'epoch': 0.27} + + 27%|██▋ | 1969/7378 [6:45:48<18:47:05, 12.50s/it] + 27%|██▋ | 1970/7378 [6:46:01<18:50:19, 12.54s/it] + +{'loss': 0.4827, 'learning_rate': 1.71971920567862e-05, 'epoch': 0.27} + + 27%|██▋ | 1970/7378 [6:46:01<18:50:19, 12.54s/it] + 27%|██▋ | 1971/7378 [6:46:14<18:58:31, 12.63s/it] + +{'loss': 0.5924, 'learning_rate': 1.7194143433362076e-05, 'epoch': 0.27} + + 27%|██▋ | 1971/7378 [6:46:14<18:58:31, 12.63s/it] + 27%|██▋ | 1972/7378 [6:46:26<18:46:59, 12.51s/it] + +{'loss': 0.4987, 'learning_rate': 1.7191093423379555e-05, 'epoch': 0.27} + + 27%|██▋ | 1972/7378 [6:46:26<18:46:59, 12.51s/it] + 27%|██▋ | 1973/7378 [6:46:38<18:43:17, 12.47s/it] + +{'loss': 0.5563, 'learning_rate': 1.718804202742647e-05, 'epoch': 0.27} + + 27%|██▋ | 1973/7378 [6:46:38<18:43:17, 12.47s/it] + 27%|██▋ | 1974/7378 [6:46:50<18:26:35, 12.29s/it] + +{'loss': 0.5275, 'learning_rate': 1.718498924609093e-05, 'epoch': 0.27} + + 27%|██▋ | 1974/7378 [6:46:50<18:26:35, 12.29s/it] + 27%|██▋ | 1975/7378 [6:47:03<18:35:22, 12.39s/it] + +{'loss': 0.5007, 'learning_rate': 1.7181935079961318e-05, 'epoch': 0.27} + + 27%|██▋ | 1975/7378 [6:47:03<18:35:22, 12.39s/it] + 27%|██▋ | 1976/7378 [6:47:15<18:32:04, 12.35s/it] + +{'loss': 0.503, 'learning_rate': 1.717887952962627e-05, 'epoch': 0.27} + + 27%|██▋ | 1976/7378 [6:47:15<18:32:04, 12.35s/it] + 27%|██▋ | 1977/7378 [6:47:27<18:33:28, 12.37s/it] + +{'loss': 0.5212, 'learning_rate': 1.71758225956747e-05, 'epoch': 0.27} + + 27%|██▋ | 1977/7378 [6:47:27<18:33:28, 12.37s/it] + 27%|██▋ | 1978/7378 [6:47:40<18:29:49, 12.33s/it] + +{'loss': 0.4963, 'learning_rate': 1.7172764278695782e-05, 'epoch': 0.27} + + 27%|██▋ | 1978/7378 [6:47:40<18:29:49, 12.33s/it] + 27%|██▋ | 1979/7378 [6:47:52<18:33:00, 12.37s/it] + +{'loss': 0.4549, 'learning_rate': 1.7169704579278955e-05, 'epoch': 0.27} + + 27%|██▋ | 1979/7378 [6:47:52<18:33:00, 12.37s/it] + 27%|██▋ | 1980/7378 [6:48:04<18:16:11, 12.18s/it] + +{'loss': 0.5225, 'learning_rate': 1.7166643498013936e-05, 'epoch': 0.27} + + 27%|██▋ | 1980/7378 [6:48:04<18:16:11, 12.18s/it] + 27%|██▋ | 1981/7378 [6:48:16<18:17:28, 12.20s/it] + +{'loss': 0.4634, 'learning_rate': 1.7163581035490695e-05, 'epoch': 0.27} + + 27%|██▋ | 1981/7378 [6:48:16<18:17:28, 12.20s/it] + 27%|██▋ | 1982/7378 [6:48:29<18:25:05, 12.29s/it] + +{'loss': 0.4734, 'learning_rate': 1.7160517192299474e-05, 'epoch': 0.27} + + 27%|██▋ | 1982/7378 [6:48:29<18:25:05, 12.29s/it] + 27%|██▋ | 1983/7378 [6:48:41<18:20:12, 12.24s/it] + +{'loss': 0.5627, 'learning_rate': 1.7157451969030786e-05, 'epoch': 0.27} + + 27%|██▋ | 1983/7378 [6:48:41<18:20:12, 12.24s/it] + 27%|██▋ | 1984/7378 [6:48:53<18:21:37, 12.25s/it] + +{'loss': 0.4567, 'learning_rate': 1.715438536627539e-05, 'epoch': 0.27} + + 27%|██▋ | 1984/7378 [6:48:53<18:21:37, 12.25s/it] + 27%|██▋ | 1985/7378 [6:49:05<18:23:47, 12.28s/it] + +{'loss': 0.4549, 'learning_rate': 1.7151317384624345e-05, 'epoch': 0.27} + + 27%|██▋ | 1985/7378 [6:49:05<18:23:47, 12.28s/it] + 27%|██▋ | 1986/7378 [6:49:18<18:33:49, 12.39s/it] + +{'loss': 0.5684, 'learning_rate': 1.7148248024668944e-05, 'epoch': 0.27} + + 27%|██▋ | 1986/7378 [6:49:18<18:33:49, 12.39s/it] + 27%|██▋ | 1987/7378 [6:49:30<18:27:43, 12.33s/it] + +{'loss': 0.4448, 'learning_rate': 1.7145177287000763e-05, 'epoch': 0.27} + + 27%|██▋ | 1987/7378 [6:49:30<18:27:43, 12.33s/it] + 27%|██▋ | 1988/7378 [6:49:42<18:20:50, 12.25s/it] + +{'loss': 0.5191, 'learning_rate': 1.7142105172211637e-05, 'epoch': 0.27} + + 27%|██▋ | 1988/7378 [6:49:42<18:20:50, 12.25s/it] + 27%|██▋ | 1989/7378 [6:49:55<18:31:40, 12.38s/it] + +{'loss': 0.4795, 'learning_rate': 1.7139031680893667e-05, 'epoch': 0.27} + + 27%|██▋ | 1989/7378 [6:49:55<18:31:40, 12.38s/it] + 27%|██▋ | 1990/7378 [6:50:07<18:27:10, 12.33s/it] + +{'loss': 0.4569, 'learning_rate': 1.7135956813639222e-05, 'epoch': 0.27} + + 27%|██▋ | 1990/7378 [6:50:07<18:27:10, 12.33s/it] + 27%|██▋ | 1991/7378 [6:50:19<18:17:19, 12.22s/it] + +{'loss': 0.3757, 'learning_rate': 1.7132880571040934e-05, 'epoch': 0.27} + + 27%|██▋ | 1991/7378 [6:50:19<18:17:19, 12.22s/it] + 27%|██▋ | 1992/7378 [6:50:31<18:20:18, 12.26s/it] + +{'loss': 0.4654, 'learning_rate': 1.71298029536917e-05, 'epoch': 0.27} + + 27%|██▋ | 1992/7378 [6:50:31<18:20:18, 12.26s/it] + 27%|██▋ | 1993/7378 [6:50:43<18:13:01, 12.18s/it] + +{'loss': 0.5211, 'learning_rate': 1.712672396218468e-05, 'epoch': 0.27} + + 27%|██▋ | 1993/7378 [6:50:43<18:13:01, 12.18s/it] + 27%|██▋ | 1994/7378 [6:50:56<18:14:13, 12.19s/it] + +{'loss': 0.4892, 'learning_rate': 1.712364359711331e-05, 'epoch': 0.27} + + 27%|██▋ | 1994/7378 [6:50:56<18:14:13, 12.19s/it] + 27%|██▋ | 1995/7378 [6:51:08<18:09:02, 12.14s/it] + +{'loss': 0.4911, 'learning_rate': 1.712056185907127e-05, 'epoch': 0.27} + + 27%|██▋ | 1995/7378 [6:51:08<18:09:02, 12.14s/it] + 27%|██▋ | 1996/7378 [6:51:20<18:17:18, 12.23s/it] + +{'loss': 0.5063, 'learning_rate': 1.7117478748652527e-05, 'epoch': 0.27} + + 27%|██▋ | 1996/7378 [6:51:20<18:17:18, 12.23s/it] + 27%|██▋ | 1997/7378 [6:51:33<18:23:20, 12.30s/it] + +{'loss': 0.4799, 'learning_rate': 1.7114394266451297e-05, 'epoch': 0.27} + + 27%|██▋ | 1997/7378 [6:51:33<18:23:20, 12.30s/it] + 27%|██▋ | 1998/7378 [6:51:45<18:13:39, 12.20s/it] + +{'loss': 0.4854, 'learning_rate': 1.7111308413062063e-05, 'epoch': 0.27} + + 27%|██▋ | 1998/7378 [6:51:45<18:13:39, 12.20s/it] + 27%|██▋ | 1999/7378 [6:51:57<18:11:04, 12.17s/it] + +{'loss': 0.5275, 'learning_rate': 1.7108221189079584e-05, 'epoch': 0.27} + + 27%|██▋ | 1999/7378 [6:51:57<18:11:04, 12.17s/it] + 27%|██▋ | 2000/7378 [6:52:09<18:11:40, 12.18s/it] + +{'loss': 0.3895, 'learning_rate': 1.7105132595098868e-05, 'epoch': 0.27} + + 27%|██▋ | 2000/7378 [6:52:09<18:11:40, 12.18s/it] + 27%|██▋ | 2001/7378 [6:52:21<18:12:08, 12.19s/it] + +{'loss': 0.4903, 'learning_rate': 1.710204263171519e-05, 'epoch': 0.27} + + 27%|██▋ | 2001/7378 [6:52:21<18:12:08, 12.19s/it] + 27%|██▋ | 2002/7378 [6:52:34<18:23:02, 12.31s/it] + +{'loss': 0.4539, 'learning_rate': 1.7098951299524104e-05, 'epoch': 0.27} + + 27%|██▋ | 2002/7378 [6:52:34<18:23:02, 12.31s/it] + 27%|██▋ | 2003/7378 [6:52:46<18:19:48, 12.28s/it] + +{'loss': 0.4645, 'learning_rate': 1.70958585991214e-05, 'epoch': 0.27} + + 27%|██▋ | 2003/7378 [6:52:46<18:19:48, 12.28s/it] + 27%|██▋ | 2004/7378 [6:52:58<18:08:30, 12.15s/it] + +{'loss': 0.4668, 'learning_rate': 1.7092764531103158e-05, 'epoch': 0.27} + + 27%|██▋ | 2004/7378 [6:52:58<18:08:30, 12.15s/it] + 27%|██▋ | 2005/7378 [6:53:10<18:18:28, 12.27s/it] + +{'loss': 0.4291, 'learning_rate': 1.7089669096065705e-05, 'epoch': 0.27} + + 27%|██▋ | 2005/7378 [6:53:10<18:18:28, 12.27s/it] + 27%|██▋ | 2006/7378 [6:53:23<18:20:58, 12.30s/it] + +{'loss': 0.5779, 'learning_rate': 1.7086572294605642e-05, 'epoch': 0.27} + + 27%|██▋ | 2006/7378 [6:53:23<18:20:58, 12.30s/it] + 27%|██▋ | 2007/7378 [6:53:35<18:17:01, 12.26s/it] + +{'loss': 0.4354, 'learning_rate': 1.708347412731983e-05, 'epoch': 0.27} + + 27%|██▋ | 2007/7378 [6:53:35<18:17:01, 12.26s/it] + 27%|██▋ | 2008/7378 [6:53:47<18:18:10, 12.27s/it] + +{'loss': 0.5446, 'learning_rate': 1.7080374594805393e-05, 'epoch': 0.27} + + 27%|██▋ | 2008/7378 [6:53:47<18:18:10, 12.27s/it] + 27%|██▋ | 2009/7378 [6:53:59<18:13:58, 12.23s/it] + +{'loss': 0.4501, 'learning_rate': 1.7077273697659706e-05, 'epoch': 0.27} + + 27%|██▋ | 2009/7378 [6:53:59<18:13:58, 12.23s/it] + 27%|██▋ | 2010/7378 [6:54:11<18:05:34, 12.13s/it] + +{'loss': 0.5244, 'learning_rate': 1.7074171436480432e-05, 'epoch': 0.27} + + 27%|██▋ | 2010/7378 [6:54:11<18:05:34, 12.13s/it] + 27%|██▋ | 2011/7378 [6:54:23<18:01:53, 12.09s/it] + +{'loss': 0.4494, 'learning_rate': 1.7071067811865477e-05, 'epoch': 0.27} + + 27%|██▋ | 2011/7378 [6:54:23<18:01:53, 12.09s/it] + 27%|██▋ | 2012/7378 [6:54:35<17:58:49, 12.06s/it] + +{'loss': 0.4437, 'learning_rate': 1.7067962824413016e-05, 'epoch': 0.27} + + 27%|██▋ | 2012/7378 [6:54:35<17:58:49, 12.06s/it] + 27%|██▋ | 2013/7378 [6:54:48<18:11:25, 12.21s/it] + +{'loss': 0.4848, 'learning_rate': 1.706485647472149e-05, 'epoch': 0.27} + + 27%|██▋ | 2013/7378 [6:54:48<18:11:25, 12.21s/it] + 27%|██▋ | 2014/7378 [6:55:00<18:17:18, 12.27s/it] + +{'loss': 0.4814, 'learning_rate': 1.7061748763389593e-05, 'epoch': 0.27} + + 27%|██▋ | 2014/7378 [6:55:00<18:17:18, 12.27s/it] + 27%|██▋ | 2015/7378 [6:55:13<18:26:28, 12.38s/it] + +{'loss': 0.464, 'learning_rate': 1.7058639691016295e-05, 'epoch': 0.27} + + 27%|██▋ | 2015/7378 [6:55:13<18:26:28, 12.38s/it] + 27%|██▋ | 2016/7378 [6:55:25<18:30:30, 12.43s/it] + +{'loss': 0.4934, 'learning_rate': 1.7055529258200815e-05, 'epoch': 0.27} + + 27%|██▋ | 2016/7378 [6:55:25<18:30:30, 12.43s/it] + 27%|██▋ | 2017/7378 [6:55:38<18:35:36, 12.49s/it] + +{'loss': 0.5042, 'learning_rate': 1.7052417465542643e-05, 'epoch': 0.27} + + 27%|██▋ | 2017/7378 [6:55:38<18:35:36, 12.49s/it] + 27%|██▋ | 2018/7378 [6:55:50<18:33:03, 12.46s/it] + +{'loss': 0.4177, 'learning_rate': 1.7049304313641532e-05, 'epoch': 0.27} + + 27%|██▋ | 2018/7378 [6:55:50<18:33:03, 12.46s/it] + 27%|██▋ | 2019/7378 [6:56:02<18:26:25, 12.39s/it] + +{'loss': 0.5027, 'learning_rate': 1.7046189803097483e-05, 'epoch': 0.27} + + 27%|██▋ | 2019/7378 [6:56:02<18:26:25, 12.39s/it] + 27%|██▋ | 2020/7378 [6:56:15<18:25:25, 12.38s/it] + +{'loss': 0.4554, 'learning_rate': 1.704307393451078e-05, 'epoch': 0.27} + + 27%|██▋ | 2020/7378 [6:56:15<18:25:25, 12.38s/it] + 27%|██▋ | 2021/7378 [6:56:27<18:19:58, 12.32s/it] + +{'loss': 0.5271, 'learning_rate': 1.7039956708481948e-05, 'epoch': 0.27} + + 27%|██▋ | 2021/7378 [6:56:27<18:19:58, 12.32s/it] + 27%|██▋ | 2022/7378 [6:56:39<18:20:04, 12.32s/it] + +{'loss': 0.52, 'learning_rate': 1.703683812561179e-05, 'epoch': 0.27} + + 27%|██▋ | 2022/7378 [6:56:39<18:20:04, 12.32s/it] + 27%|██▋ | 2023/7378 [6:56:52<18:27:57, 12.41s/it] + +{'loss': 0.4884, 'learning_rate': 1.7033718186501366e-05, 'epoch': 0.27} + + 27%|██▋ | 2023/7378 [6:56:52<18:27:57, 12.41s/it] + 27%|██▋ | 2024/7378 [6:57:04<18:22:48, 12.36s/it] + +{'loss': 0.4589, 'learning_rate': 1.703059689175199e-05, 'epoch': 0.27} + + 27%|██▋ | 2024/7378 [6:57:04<18:22:48, 12.36s/it] + 27%|██▋ | 2025/7378 [6:57:17<18:26:26, 12.40s/it] + +{'loss': 0.501, 'learning_rate': 1.7027474241965242e-05, 'epoch': 0.27} + + 27%|██▋ | 2025/7378 [6:57:17<18:26:26, 12.40s/it] + 27%|██▋ | 2026/7378 [6:57:29<18:29:02, 12.43s/it] + +{'loss': 0.4918, 'learning_rate': 1.7024350237742967e-05, 'epoch': 0.27} + + 27%|██▋ | 2026/7378 [6:57:29<18:29:02, 12.43s/it] + 27%|██▋ | 2027/7378 [6:57:41<18:21:17, 12.35s/it] + +{'loss': 0.445, 'learning_rate': 1.702122487968727e-05, 'epoch': 0.27} + + 27%|██▋ | 2027/7378 [6:57:41<18:21:17, 12.35s/it] + 27%|██▋ | 2028/7378 [6:57:53<18:09:17, 12.22s/it] + +{'loss': 0.5089, 'learning_rate': 1.701809816840051e-05, 'epoch': 0.27} + + 27%|██▋ | 2028/7378 [6:57:53<18:09:17, 12.22s/it] + 28%|██▊ | 2029/7378 [6:58:06<18:11:31, 12.24s/it] + +{'loss': 0.4892, 'learning_rate': 1.7014970104485316e-05, 'epoch': 0.28} + + 28%|██▊ | 2029/7378 [6:58:06<18:11:31, 12.24s/it] + 28%|██▊ | 2030/7378 [6:58:18<18:20:04, 12.34s/it] + +{'loss': 0.5334, 'learning_rate': 1.701184068854457e-05, 'epoch': 0.28} + + 28%|██▊ | 2030/7378 [6:58:18<18:20:04, 12.34s/it] + 28%|██▊ | 2031/7378 [6:58:30<18:12:21, 12.26s/it] + +{'loss': 0.45, 'learning_rate': 1.7008709921181415e-05, 'epoch': 0.28} + + 28%|██▊ | 2031/7378 [6:58:30<18:12:21, 12.26s/it] + 28%|██▊ | 2032/7378 [6:58:43<18:15:23, 12.29s/it] + +{'loss': 0.5065, 'learning_rate': 1.7005577802999264e-05, 'epoch': 0.28} + + 28%|██▊ | 2032/7378 [6:58:43<18:15:23, 12.29s/it] + 28%|██▊ | 2033/7378 [6:58:55<18:11:45, 12.26s/it] + +{'loss': 0.467, 'learning_rate': 1.700244433460178e-05, 'epoch': 0.28} + + 28%|██▊ | 2033/7378 [6:58:55<18:11:45, 12.26s/it] + 28%|██▊ | 2034/7378 [6:59:07<18:18:07, 12.33s/it] + +{'loss': 0.4587, 'learning_rate': 1.699930951659289e-05, 'epoch': 0.28} + + 28%|██▊ | 2034/7378 [6:59:07<18:18:07, 12.33s/it] + 28%|██▊ | 2035/7378 [6:59:19<18:14:07, 12.29s/it] + +{'loss': 0.4424, 'learning_rate': 1.6996173349576783e-05, 'epoch': 0.28} + + 28%|██▊ | 2035/7378 [6:59:19<18:14:07, 12.29s/it] + 28%|██▊ | 2036/7378 [6:59:32<18:09:01, 12.23s/it] + +{'loss': 0.4447, 'learning_rate': 1.6993035834157905e-05, 'epoch': 0.28} + + 28%|██▊ | 2036/7378 [6:59:32<18:09:01, 12.23s/it] + 28%|██▊ | 2037/7378 [6:59:44<18:12:21, 12.27s/it] + +{'loss': 0.4935, 'learning_rate': 1.6989896970940966e-05, 'epoch': 0.28} + + 28%|██▊ | 2037/7378 [6:59:44<18:12:21, 12.27s/it] + 28%|██▊ | 2038/7378 [6:59:56<18:15:38, 12.31s/it] + +{'loss': 0.5113, 'learning_rate': 1.698675676053092e-05, 'epoch': 0.28} + + 28%|██▊ | 2038/7378 [6:59:56<18:15:38, 12.31s/it] + 28%|██▊ | 2039/7378 [7:00:09<18:15:49, 12.31s/it] + +{'loss': 0.432, 'learning_rate': 1.698361520353301e-05, 'epoch': 0.28} + + 28%|██▊ | 2039/7378 [7:00:09<18:15:49, 12.31s/it] + 28%|██▊ | 2040/7378 [7:00:21<18:13:41, 12.29s/it] + +{'loss': 0.5459, 'learning_rate': 1.6980472300552712e-05, 'epoch': 0.28} + + 28%|██▊ | 2040/7378 [7:00:21<18:13:41, 12.29s/it] + 28%|██▊ | 2041/7378 [7:00:33<18:18:25, 12.35s/it] + +{'loss': 0.4799, 'learning_rate': 1.6977328052195777e-05, 'epoch': 0.28} + + 28%|██▊ | 2041/7378 [7:00:33<18:18:25, 12.35s/it] + 28%|██▊ | 2042/7378 [7:00:46<18:18:48, 12.36s/it] + +{'loss': 0.4962, 'learning_rate': 1.6974182459068203e-05, 'epoch': 0.28} + + 28%|██▊ | 2042/7378 [7:00:46<18:18:48, 12.36s/it] + 28%|██▊ | 2043/7378 [7:00:58<18:15:54, 12.33s/it] + +{'loss': 0.4587, 'learning_rate': 1.697103552177626e-05, 'epoch': 0.28} + + 28%|██▊ | 2043/7378 [7:00:58<18:15:54, 12.33s/it] + 28%|██▊ | 2044/7378 [7:01:11<18:28:10, 12.47s/it] + +{'loss': 0.4868, 'learning_rate': 1.6967887240926465e-05, 'epoch': 0.28} + + 28%|██▊ | 2044/7378 [7:01:11<18:28:10, 12.47s/it] + 28%|██▊ | 2045/7378 [7:01:23<18:20:23, 12.38s/it] + +{'loss': 0.4911, 'learning_rate': 1.6964737617125605e-05, 'epoch': 0.28} + + 28%|██▊ | 2045/7378 [7:01:23<18:20:23, 12.38s/it] + 28%|██▊ | 2046/7378 [7:01:36<18:29:42, 12.49s/it] + +{'loss': 0.4324, 'learning_rate': 1.696158665098072e-05, 'epoch': 0.28} + + 28%|██▊ | 2046/7378 [7:01:36<18:29:42, 12.49s/it] + 28%|██▊ | 2047/7378 [7:01:48<18:18:40, 12.37s/it] + +{'loss': 0.5086, 'learning_rate': 1.6958434343099104e-05, 'epoch': 0.28} + + 28%|██▊ | 2047/7378 [7:01:48<18:18:40, 12.37s/it] + 28%|██▊ | 2048/7378 [7:02:00<18:17:11, 12.35s/it] + +{'loss': 0.4814, 'learning_rate': 1.695528069408832e-05, 'epoch': 0.28} + + 28%|██▊ | 2048/7378 [7:02:00<18:17:11, 12.35s/it] + 28%|██▊ | 2049/7378 [7:02:12<18:10:10, 12.27s/it] + +{'loss': 0.4739, 'learning_rate': 1.6952125704556186e-05, 'epoch': 0.28} + + 28%|██▊ | 2049/7378 [7:02:12<18:10:10, 12.27s/it] + 28%|██▊ | 2050/7378 [7:02:25<18:18:52, 12.37s/it] + +{'loss': 0.463, 'learning_rate': 1.6948969375110772e-05, 'epoch': 0.28} + + 28%|██▊ | 2050/7378 [7:02:25<18:18:52, 12.37s/it] + 28%|██▊ | 2051/7378 [7:02:37<18:10:52, 12.29s/it] + +{'loss': 0.4945, 'learning_rate': 1.6945811706360412e-05, 'epoch': 0.28} + + 28%|██▊ | 2051/7378 [7:02:37<18:10:52, 12.29s/it] + 28%|██▊ | 2052/7378 [7:02:49<18:14:25, 12.33s/it] + +{'loss': 0.5054, 'learning_rate': 1.69426526989137e-05, 'epoch': 0.28} + + 28%|██▊ | 2052/7378 [7:02:49<18:14:25, 12.33s/it] + 28%|██▊ | 2053/7378 [7:03:01<18:07:17, 12.25s/it] + +{'loss': 0.4223, 'learning_rate': 1.6939492353379483e-05, 'epoch': 0.28} + + 28%|██▊ | 2053/7378 [7:03:01<18:07:17, 12.25s/it] + 28%|██▊ | 2054/7378 [7:03:13<18:01:36, 12.19s/it] + +{'loss': 0.4756, 'learning_rate': 1.6936330670366867e-05, 'epoch': 0.28} + + 28%|██▊ | 2054/7378 [7:03:13<18:01:36, 12.19s/it] + 28%|██▊ | 2055/7378 [7:03:26<18:02:02, 12.20s/it] + +{'loss': 0.4621, 'learning_rate': 1.6933167650485222e-05, 'epoch': 0.28} + + 28%|██▊ | 2055/7378 [7:03:26<18:02:02, 12.20s/it] + 28%|██▊ | 2056/7378 [7:03:38<18:03:46, 12.22s/it] + +{'loss': 0.5265, 'learning_rate': 1.6930003294344163e-05, 'epoch': 0.28} + + 28%|██▊ | 2056/7378 [7:03:38<18:03:46, 12.22s/it] + 28%|██▊ | 2057/7378 [7:03:50<18:12:52, 12.32s/it] + +{'loss': 0.3964, 'learning_rate': 1.6926837602553577e-05, 'epoch': 0.28} + + 28%|██▊ | 2057/7378 [7:03:50<18:12:52, 12.32s/it] + 28%|██▊ | 2058/7378 [7:04:03<18:19:40, 12.40s/it] + +{'loss': 0.4889, 'learning_rate': 1.6923670575723595e-05, 'epoch': 0.28} + + 28%|██▊ | 2058/7378 [7:04:03<18:19:40, 12.40s/it] + 28%|██▊ | 2059/7378 [7:04:15<18:09:19, 12.29s/it] + +{'loss': 0.4681, 'learning_rate': 1.692050221446462e-05, 'epoch': 0.28} + + 28%|██▊ | 2059/7378 [7:04:15<18:09:19, 12.29s/it] + 28%|██▊ | 2060/7378 [7:04:27<18:06:40, 12.26s/it] + +{'loss': 0.5057, 'learning_rate': 1.6917332519387294e-05, 'epoch': 0.28} + + 28%|██▊ | 2060/7378 [7:04:27<18:06:40, 12.26s/it] + 28%|██▊ | 2061/7378 [7:04:39<18:03:59, 12.23s/it] + +{'loss': 0.469, 'learning_rate': 1.6914161491102535e-05, 'epoch': 0.28} + + 28%|██▊ | 2061/7378 [7:04:39<18:03:59, 12.23s/it] + 28%|██▊ | 2062/7378 [7:04:51<17:53:20, 12.11s/it] + +{'loss': 0.4234, 'learning_rate': 1.69109891302215e-05, 'epoch': 0.28} + + 28%|██▊ | 2062/7378 [7:04:51<17:53:20, 12.11s/it] + 28%|██▊ | 2063/7378 [7:05:03<17:55:05, 12.14s/it] + +{'loss': 0.4924, 'learning_rate': 1.6907815437355625e-05, 'epoch': 0.28} + + 28%|██▊ | 2063/7378 [7:05:03<17:55:05, 12.14s/it] + 28%|██▊ | 2064/7378 [7:05:16<18:01:28, 12.21s/it] + +{'loss': 0.4749, 'learning_rate': 1.6904640413116576e-05, 'epoch': 0.28} + + 28%|██▊ | 2064/7378 [7:05:16<18:01:28, 12.21s/it] + 28%|██▊ | 2065/7378 [7:05:28<18:07:02, 12.28s/it] + +{'loss': 0.4765, 'learning_rate': 1.6901464058116298e-05, 'epoch': 0.28} + + 28%|██▊ | 2065/7378 [7:05:28<18:07:02, 12.28s/it] + 28%|██▊ | 2066/7378 [7:05:40<18:00:32, 12.20s/it] + +{'loss': 0.4722, 'learning_rate': 1.6898286372966976e-05, 'epoch': 0.28} + + 28%|██▊ | 2066/7378 [7:05:40<18:00:32, 12.20s/it] + 28%|██▊ | 2067/7378 [7:05:53<18:07:54, 12.29s/it] + +{'loss': 0.5043, 'learning_rate': 1.6895107358281065e-05, 'epoch': 0.28} + + 28%|██▊ | 2067/7378 [7:05:53<18:07:54, 12.29s/it] + 28%|██▊ | 2068/7378 [7:06:05<18:09:15, 12.31s/it] + +{'loss': 0.5049, 'learning_rate': 1.689192701467127e-05, 'epoch': 0.28} + + 28%|██▊ | 2068/7378 [7:06:05<18:09:15, 12.31s/it] + 28%|██▊ | 2069/7378 [7:06:17<18:05:19, 12.27s/it] + +{'loss': 0.5021, 'learning_rate': 1.6888745342750553e-05, 'epoch': 0.28} + + 28%|██▊ | 2069/7378 [7:06:17<18:05:19, 12.27s/it] + 28%|██▊ | 2070/7378 [7:06:30<18:03:41, 12.25s/it] + +{'loss': 0.4731, 'learning_rate': 1.6885562343132124e-05, 'epoch': 0.28} + + 28%|██▊ | 2070/7378 [7:06:30<18:03:41, 12.25s/it] + 28%|██▊ | 2071/7378 [7:06:42<18:09:59, 12.32s/it] + +{'loss': 0.5002, 'learning_rate': 1.6882378016429467e-05, 'epoch': 0.28} + + 28%|██▊ | 2071/7378 [7:06:42<18:09:59, 12.32s/it] + 28%|██▊ | 2072/7378 [7:06:54<18:06:49, 12.29s/it] + +{'loss': 0.4433, 'learning_rate': 1.6879192363256304e-05, 'epoch': 0.28} + + 28%|██▊ | 2072/7378 [7:06:54<18:06:49, 12.29s/it] + 28%|██▊ | 2073/7378 [7:07:07<18:17:20, 12.41s/it] + +{'loss': 0.4562, 'learning_rate': 1.6876005384226623e-05, 'epoch': 0.28} + + 28%|██▊ | 2073/7378 [7:07:07<18:17:20, 12.41s/it] + 28%|██▊ | 2074/7378 [7:07:20<18:22:05, 12.47s/it] + +{'loss': 0.5072, 'learning_rate': 1.687281707995466e-05, 'epoch': 0.28} + + 28%|██▊ | 2074/7378 [7:07:20<18:22:05, 12.47s/it] + 28%|██▊ | 2075/7378 [7:07:33<18:39:20, 12.66s/it] + +{'loss': 0.4899, 'learning_rate': 1.6869627451054917e-05, 'epoch': 0.28} + + 28%|██▊ | 2075/7378 [7:07:33<18:39:20, 12.66s/it] + 28%|██▊ | 2076/7378 [7:07:45<18:43:14, 12.71s/it] + +{'loss': 0.4913, 'learning_rate': 1.6866436498142136e-05, 'epoch': 0.28} + + 28%|██▊ | 2076/7378 [7:07:45<18:43:14, 12.71s/it] + 28%|██▊ | 2077/7378 [7:07:58<18:43:25, 12.72s/it] + +{'loss': 0.4792, 'learning_rate': 1.6863244221831334e-05, 'epoch': 0.28} + + 28%|██▊ | 2077/7378 [7:07:58<18:43:25, 12.72s/it] + 28%|██▊ | 2078/7378 [7:08:11<18:37:04, 12.65s/it] + +{'loss': 0.5331, 'learning_rate': 1.6860050622737764e-05, 'epoch': 0.28} + + 28%|██▊ | 2078/7378 [7:08:11<18:37:04, 12.65s/it] + 28%|██▊ | 2079/7378 [7:08:22<18:14:55, 12.40s/it] + +{'loss': 0.4547, 'learning_rate': 1.6856855701476947e-05, 'epoch': 0.28} + + 28%|██▊ | 2079/7378 [7:08:22<18:14:55, 12.40s/it] + 28%|██▊ | 2080/7378 [7:08:35<18:13:07, 12.38s/it] + +{'loss': 0.4761, 'learning_rate': 1.6853659458664653e-05, 'epoch': 0.28} + + 28%|██▊ | 2080/7378 [7:08:35<18:13:07, 12.38s/it] + 28%|██▊ | 2081/7378 [7:08:47<18:09:31, 12.34s/it] + +{'loss': 0.4659, 'learning_rate': 1.6850461894916903e-05, 'epoch': 0.28} + + 28%|██▊ | 2081/7378 [7:08:47<18:09:31, 12.34s/it] + 28%|██▊ | 2082/7378 [7:08:59<18:06:08, 12.31s/it] + +{'loss': 0.5144, 'learning_rate': 1.6847263010849983e-05, 'epoch': 0.28} + + 28%|██▊ | 2082/7378 [7:08:59<18:06:08, 12.31s/it] + 28%|██▊ | 2083/7378 [7:09:12<18:05:25, 12.30s/it] + +{'loss': 0.5119, 'learning_rate': 1.684406280708043e-05, 'epoch': 0.28} + + 28%|██▊ | 2083/7378 [7:09:12<18:05:25, 12.30s/it] + 28%|██▊ | 2084/7378 [7:09:24<18:17:20, 12.44s/it] + +{'loss': 0.4313, 'learning_rate': 1.6840861284225022e-05, 'epoch': 0.28} + + 28%|██▊ | 2084/7378 [7:09:24<18:17:20, 12.44s/it] + 28%|██▊ | 2085/7378 [7:09:37<18:19:05, 12.46s/it] + +{'loss': 0.4379, 'learning_rate': 1.6837658442900814e-05, 'epoch': 0.28} + + 28%|██▊ | 2085/7378 [7:09:37<18:19:05, 12.46s/it] + 28%|██▊ | 2086/7378 [7:09:50<18:28:20, 12.57s/it] + +{'loss': 0.4539, 'learning_rate': 1.6834454283725094e-05, 'epoch': 0.28} + + 28%|██▊ | 2086/7378 [7:09:50<18:28:20, 12.57s/it] + 28%|██▊ | 2087/7378 [7:10:02<18:23:16, 12.51s/it] + +{'loss': 0.5394, 'learning_rate': 1.6831248807315424e-05, 'epoch': 0.28} + + 28%|██▊ | 2087/7378 [7:10:02<18:23:16, 12.51s/it] + 28%|██▊ | 2088/7378 [7:10:14<18:13:16, 12.40s/it] + +{'loss': 0.5167, 'learning_rate': 1.68280420142896e-05, 'epoch': 0.28} + + 28%|██▊ | 2088/7378 [7:10:14<18:13:16, 12.40s/it] + 28%|██▊ | 2089/7378 [7:10:27<18:16:40, 12.44s/it] + +{'loss': 0.4637, 'learning_rate': 1.6824833905265685e-05, 'epoch': 0.28} + + 28%|██▊ | 2089/7378 [7:10:27<18:16:40, 12.44s/it] + 28%|██▊ | 2090/7378 [7:10:39<18:08:18, 12.35s/it] + +{'loss': 0.4403, 'learning_rate': 1.6821624480861994e-05, 'epoch': 0.28} + + 28%|██▊ | 2090/7378 [7:10:39<18:08:18, 12.35s/it] + 28%|██▊ | 2091/7378 [7:10:51<17:57:47, 12.23s/it] + +{'loss': 0.468, 'learning_rate': 1.6818413741697086e-05, 'epoch': 0.28} + + 28%|██▊ | 2091/7378 [7:10:51<17:57:47, 12.23s/it] + 28%|██▊ | 2092/7378 [7:11:03<17:53:14, 12.18s/it] + +{'loss': 0.4785, 'learning_rate': 1.681520168838979e-05, 'epoch': 0.28} + + 28%|██▊ | 2092/7378 [7:11:03<17:53:14, 12.18s/it] + 28%|██▊ | 2093/7378 [7:11:15<17:53:36, 12.19s/it] + +{'loss': 0.5143, 'learning_rate': 1.6811988321559173e-05, 'epoch': 0.28} + + 28%|██▊ | 2093/7378 [7:11:15<17:53:36, 12.19s/it] + 28%|██▊ | 2094/7378 [7:11:27<17:49:57, 12.15s/it] + +{'loss': 0.4644, 'learning_rate': 1.6808773641824562e-05, 'epoch': 0.28} + + 28%|██▊ | 2094/7378 [7:11:27<17:49:57, 12.15s/it] + 28%|██▊ | 2095/7378 [7:11:39<17:48:30, 12.14s/it] + +{'loss': 0.4886, 'learning_rate': 1.6805557649805536e-05, 'epoch': 0.28} + + 28%|██▊ | 2095/7378 [7:11:39<17:48:30, 12.14s/it] + 28%|██▊ | 2096/7378 [7:11:52<18:00:13, 12.27s/it] + +{'loss': 0.4857, 'learning_rate': 1.680234034612193e-05, 'epoch': 0.28} + + 28%|██▊ | 2096/7378 [7:11:52<18:00:13, 12.27s/it] + 28%|██▊ | 2097/7378 [7:12:04<18:04:39, 12.32s/it] + +{'loss': 0.4847, 'learning_rate': 1.6799121731393825e-05, 'epoch': 0.28} + + 28%|██▊ | 2097/7378 [7:12:04<18:04:39, 12.32s/it] + 28%|██▊ | 2098/7378 [7:12:17<18:08:46, 12.37s/it] + +{'loss': 0.5423, 'learning_rate': 1.679590180624156e-05, 'epoch': 0.28} + + 28%|██▊ | 2098/7378 [7:12:17<18:08:46, 12.37s/it] + 28%|██▊ | 2099/7378 [7:12:29<17:53:03, 12.20s/it] + +{'loss': 0.5284, 'learning_rate': 1.6792680571285726e-05, 'epoch': 0.28} + + 28%|██▊ | 2099/7378 [7:12:29<17:53:03, 12.20s/it] + 28%|██▊ | 2100/7378 [7:12:41<17:52:18, 12.19s/it] + +{'loss': 0.4512, 'learning_rate': 1.6789458027147163e-05, 'epoch': 0.28} + + 28%|██▊ | 2100/7378 [7:12:41<17:52:18, 12.19s/it] + 28%|██▊ | 2101/7378 [7:12:53<17:54:07, 12.21s/it] + +{'loss': 0.5457, 'learning_rate': 1.6786234174446966e-05, 'epoch': 0.28} + + 28%|██▊ | 2101/7378 [7:12:53<17:54:07, 12.21s/it] + 28%|██▊ | 2102/7378 [7:13:06<18:07:20, 12.37s/it] + +{'loss': 0.4683, 'learning_rate': 1.678300901380649e-05, 'epoch': 0.28} + + 28%|██▊ | 2102/7378 [7:13:06<18:07:20, 12.37s/it] + 29%|██▊ | 2103/7378 [7:13:18<18:01:38, 12.30s/it] + +{'loss': 0.5, 'learning_rate': 1.6779782545847322e-05, 'epoch': 0.29} + + 29%|██▊ | 2103/7378 [7:13:18<18:01:38, 12.30s/it] + 29%|██▊ | 2104/7378 [7:13:30<18:09:01, 12.39s/it] + +{'loss': 0.4219, 'learning_rate': 1.6776554771191324e-05, 'epoch': 0.29} + + 29%|██▊ | 2104/7378 [7:13:30<18:09:01, 12.39s/it] + 29%|██▊ | 2105/7378 [7:13:42<17:55:39, 12.24s/it] + +{'loss': 0.4475, 'learning_rate': 1.677332569046059e-05, 'epoch': 0.29} + + 29%|██▊ | 2105/7378 [7:13:42<17:55:39, 12.24s/it] + 29%|██▊ | 2106/7378 [7:13:55<17:55:14, 12.24s/it] + +{'loss': 0.463, 'learning_rate': 1.6770095304277477e-05, 'epoch': 0.29} + + 29%|██▊ | 2106/7378 [7:13:55<17:55:14, 12.24s/it] + 29%|██▊ | 2107/7378 [7:14:07<17:48:55, 12.17s/it] + +{'loss': 0.5318, 'learning_rate': 1.6766863613264596e-05, 'epoch': 0.29} + + 29%|██▊ | 2107/7378 [7:14:07<17:48:55, 12.17s/it] + 29%|██▊ | 2108/7378 [7:14:19<17:55:14, 12.24s/it] + +{'loss': 0.4452, 'learning_rate': 1.6763630618044802e-05, 'epoch': 0.29} + + 29%|██▊ | 2108/7378 [7:14:19<17:55:14, 12.24s/it] + 29%|██▊ | 2109/7378 [7:14:31<17:56:59, 12.26s/it] + +{'loss': 0.5045, 'learning_rate': 1.6760396319241204e-05, 'epoch': 0.29} + + 29%|██▊ | 2109/7378 [7:14:31<17:56:59, 12.26s/it] + 29%|██▊ | 2110/7378 [7:14:44<18:07:15, 12.38s/it] + +{'loss': 0.4606, 'learning_rate': 1.6757160717477157e-05, 'epoch': 0.29} + + 29%|██▊ | 2110/7378 [7:14:44<18:07:15, 12.38s/it] + 29%|██▊ | 2111/7378 [7:14:56<18:06:13, 12.37s/it] + +{'loss': 0.4328, 'learning_rate': 1.6753923813376285e-05, 'epoch': 0.29} + + 29%|██▊ | 2111/7378 [7:14:56<18:06:13, 12.37s/it] + 29%|██▊ | 2112/7378 [7:15:09<18:06:55, 12.38s/it] + +{'loss': 0.5085, 'learning_rate': 1.675068560756244e-05, 'epoch': 0.29} + + 29%|██▊ | 2112/7378 [7:15:09<18:06:55, 12.38s/it] + 29%|██▊ | 2113/7378 [7:15:21<18:07:27, 12.39s/it] + +{'loss': 0.4866, 'learning_rate': 1.674744610065974e-05, 'epoch': 0.29} + + 29%|██▊ | 2113/7378 [7:15:21<18:07:27, 12.39s/it] + 29%|██▊ | 2114/7378 [7:15:34<18:12:23, 12.45s/it] + +{'loss': 0.4715, 'learning_rate': 1.674420529329255e-05, 'epoch': 0.29} + + 29%|██▊ | 2114/7378 [7:15:34<18:12:23, 12.45s/it] + 29%|██▊ | 2115/7378 [7:15:46<18:00:10, 12.31s/it] + +{'loss': 0.4996, 'learning_rate': 1.6740963186085478e-05, 'epoch': 0.29} + + 29%|██▊ | 2115/7378 [7:15:46<18:00:10, 12.31s/it] + 29%|██▊ | 2116/7378 [7:15:58<17:52:25, 12.23s/it] + +{'loss': 0.4551, 'learning_rate': 1.67377197796634e-05, 'epoch': 0.29} + + 29%|██▊ | 2116/7378 [7:15:58<17:52:25, 12.23s/it] + 29%|██▊ | 2117/7378 [7:16:10<18:02:50, 12.35s/it] + +{'loss': 0.5067, 'learning_rate': 1.6734475074651418e-05, 'epoch': 0.29} + + 29%|██▊ | 2117/7378 [7:16:10<18:02:50, 12.35s/it] + 29%|██▊ | 2118/7378 [7:16:23<18:04:50, 12.37s/it] + +{'loss': 0.5217, 'learning_rate': 1.6731229071674914e-05, 'epoch': 0.29} + + 29%|██▊ | 2118/7378 [7:16:23<18:04:50, 12.37s/it] + 29%|██▊ | 2119/7378 [7:16:35<17:59:31, 12.32s/it] + +{'loss': 0.472, 'learning_rate': 1.6727981771359492e-05, 'epoch': 0.29} + + 29%|██▊ | 2119/7378 [7:16:35<17:59:31, 12.32s/it] + 29%|██▊ | 2120/7378 [7:16:47<17:53:22, 12.25s/it] + +{'loss': 0.4658, 'learning_rate': 1.6724733174331022e-05, 'epoch': 0.29} + + 29%|██▊ | 2120/7378 [7:16:47<17:53:22, 12.25s/it] + 29%|██▊ | 2121/7378 [7:17:00<17:57:11, 12.29s/it] + +{'loss': 0.4338, 'learning_rate': 1.6721483281215622e-05, 'epoch': 0.29} + + 29%|██▊ | 2121/7378 [7:17:00<17:57:11, 12.29s/it] + 29%|██▉ | 2122/7378 [7:17:12<17:58:01, 12.31s/it] + +{'loss': 0.5767, 'learning_rate': 1.6718232092639657e-05, 'epoch': 0.29} + + 29%|██▉ | 2122/7378 [7:17:12<17:58:01, 12.31s/it] + 29%|██▉ | 2123/7378 [7:17:24<17:55:04, 12.27s/it] + +{'loss': 0.4738, 'learning_rate': 1.6714979609229743e-05, 'epoch': 0.29} + + 29%|██▉ | 2123/7378 [7:17:24<17:55:04, 12.27s/it] + 29%|██▉ | 2124/7378 [7:17:37<17:59:51, 12.33s/it] + +{'loss': 0.5089, 'learning_rate': 1.6711725831612743e-05, 'epoch': 0.29} + + 29%|██▉ | 2124/7378 [7:17:37<17:59:51, 12.33s/it] + 29%|██▉ | 2125/7378 [7:17:49<18:04:29, 12.39s/it] + +{'loss': 0.4379, 'learning_rate': 1.6708470760415774e-05, 'epoch': 0.29} + + 29%|██▉ | 2125/7378 [7:17:49<18:04:29, 12.39s/it] + 29%|██▉ | 2126/7378 [7:18:02<18:21:34, 12.58s/it] + +{'loss': 0.4881, 'learning_rate': 1.6705214396266196e-05, 'epoch': 0.29} + + 29%|██▉ | 2126/7378 [7:18:02<18:21:34, 12.58s/it] + 29%|██▉ | 2127/7378 [7:18:14<18:15:51, 12.52s/it] + +{'loss': 0.4173, 'learning_rate': 1.670195673979163e-05, 'epoch': 0.29} + + 29%|██▉ | 2127/7378 [7:18:14<18:15:51, 12.52s/it] + 29%|██▉ | 2128/7378 [7:18:27<18:10:33, 12.46s/it] + +{'loss': 0.5525, 'learning_rate': 1.6698697791619928e-05, 'epoch': 0.29} + + 29%|██▉ | 2128/7378 [7:18:27<18:10:33, 12.46s/it] + 29%|██▉ | 2129/7378 [7:18:39<18:07:26, 12.43s/it] + +{'loss': 0.4577, 'learning_rate': 1.669543755237921e-05, 'epoch': 0.29} + + 29%|██▉ | 2129/7378 [7:18:39<18:07:26, 12.43s/it] + 29%|██▉ | 2130/7378 [7:18:52<18:15:58, 12.53s/it] + +{'loss': 0.5068, 'learning_rate': 1.6692176022697834e-05, 'epoch': 0.29} + + 29%|██▉ | 2130/7378 [7:18:52<18:15:58, 12.53s/it] + 29%|██▉ | 2131/7378 [7:19:05<18:17:54, 12.55s/it] + +{'loss': 0.4376, 'learning_rate': 1.6688913203204404e-05, 'epoch': 0.29} + + 29%|██▉ | 2131/7378 [7:19:05<18:17:54, 12.55s/it] + 29%|██▉ | 2132/7378 [7:19:17<18:16:27, 12.54s/it] + +{'loss': 0.4321, 'learning_rate': 1.668564909452778e-05, 'epoch': 0.29} + + 29%|██▉ | 2132/7378 [7:19:17<18:16:27, 12.54s/it] + 29%|██▉ | 2133/7378 [7:19:29<18:06:11, 12.43s/it] + +{'loss': 0.4326, 'learning_rate': 1.668238369729707e-05, 'epoch': 0.29} + + 29%|██▉ | 2133/7378 [7:19:29<18:06:11, 12.43s/it] + 29%|██▉ | 2134/7378 [7:19:42<18:11:23, 12.49s/it] + +{'loss': 0.5873, 'learning_rate': 1.667911701214163e-05, 'epoch': 0.29} + + 29%|██▉ | 2134/7378 [7:19:42<18:11:23, 12.49s/it] + 29%|██▉ | 2135/7378 [7:19:54<17:58:46, 12.35s/it] + +{'loss': 0.4937, 'learning_rate': 1.6675849039691057e-05, 'epoch': 0.29} + + 29%|██▉ | 2135/7378 [7:19:54<17:58:46, 12.35s/it] + 29%|██▉ | 2136/7378 [7:20:06<18:06:52, 12.44s/it] + +{'loss': 0.5721, 'learning_rate': 1.6672579780575202e-05, 'epoch': 0.29} + + 29%|██▉ | 2136/7378 [7:20:06<18:06:52, 12.44s/it] + 29%|██▉ | 2137/7378 [7:20:19<18:05:27, 12.43s/it] + +{'loss': 0.5193, 'learning_rate': 1.6669309235424166e-05, 'epoch': 0.29} + + 29%|██▉ | 2137/7378 [7:20:19<18:05:27, 12.43s/it] + 29%|██▉ | 2138/7378 [7:20:31<17:54:35, 12.30s/it] + +{'loss': 0.4022, 'learning_rate': 1.6666037404868295e-05, 'epoch': 0.29} + + 29%|██▉ | 2138/7378 [7:20:31<17:54:35, 12.30s/it] + 29%|██▉ | 2139/7378 [7:20:43<17:43:27, 12.18s/it] + +{'loss': 0.453, 'learning_rate': 1.666276428953818e-05, 'epoch': 0.29} + + 29%|██▉ | 2139/7378 [7:20:43<17:43:27, 12.18s/it] + 29%|██▉ | 2140/7378 [7:20:55<17:34:13, 12.08s/it] + +{'loss': 0.4056, 'learning_rate': 1.6659489890064666e-05, 'epoch': 0.29} + + 29%|██▉ | 2140/7378 [7:20:55<17:34:13, 12.08s/it] + 29%|██▉ | 2141/7378 [7:21:07<17:45:10, 12.20s/it] + +{'loss': 0.5095, 'learning_rate': 1.665621420707884e-05, 'epoch': 0.29} + + 29%|██▉ | 2141/7378 [7:21:07<17:45:10, 12.20s/it] + 29%|██▉ | 2142/7378 [7:21:20<17:57:47, 12.35s/it] + +{'loss': 0.4651, 'learning_rate': 1.665293724121204e-05, 'epoch': 0.29} + + 29%|██▉ | 2142/7378 [7:21:20<17:57:47, 12.35s/it] + 29%|██▉ | 2143/7378 [7:21:33<18:06:58, 12.46s/it] + +{'loss': 0.4497, 'learning_rate': 1.6649658993095853e-05, 'epoch': 0.29} + + 29%|██▉ | 2143/7378 [7:21:33<18:06:58, 12.46s/it] + 29%|██▉ | 2144/7378 [7:21:45<17:57:49, 12.36s/it] + +{'loss': 0.4694, 'learning_rate': 1.6646379463362102e-05, 'epoch': 0.29} + + 29%|██▉ | 2144/7378 [7:21:45<17:57:49, 12.36s/it] + 29%|██▉ | 2145/7378 [7:21:57<17:47:12, 12.24s/it] + +{'loss': 0.4498, 'learning_rate': 1.6643098652642875e-05, 'epoch': 0.29} + + 29%|██▉ | 2145/7378 [7:21:57<17:47:12, 12.24s/it] + 29%|██▉ | 2146/7378 [7:22:09<17:58:38, 12.37s/it] + +{'loss': 0.5158, 'learning_rate': 1.663981656157049e-05, 'epoch': 0.29} + + 29%|██▉ | 2146/7378 [7:22:09<17:58:38, 12.37s/it] + 29%|██▉ | 2147/7378 [7:22:21<17:52:06, 12.30s/it] + +{'loss': 0.4921, 'learning_rate': 1.6636533190777515e-05, 'epoch': 0.29} + + 29%|██▉ | 2147/7378 [7:22:21<17:52:06, 12.30s/it] + 29%|██▉ | 2148/7378 [7:22:33<17:43:22, 12.20s/it] + +{'loss': 0.4463, 'learning_rate': 1.6633248540896775e-05, 'epoch': 0.29} + + 29%|██▉ | 2148/7378 [7:22:33<17:43:22, 12.20s/it] + 29%|██▉ | 2149/7378 [7:22:46<17:42:15, 12.19s/it] + +{'loss': 0.4925, 'learning_rate': 1.6629962612561337e-05, 'epoch': 0.29} + + 29%|██▉ | 2149/7378 [7:22:46<17:42:15, 12.19s/it] + 29%|██▉ | 2150/7378 [7:22:58<17:46:53, 12.24s/it] + +{'loss': 0.4935, 'learning_rate': 1.6626675406404503e-05, 'epoch': 0.29} + + 29%|██▉ | 2150/7378 [7:22:58<17:46:53, 12.24s/it] + 29%|██▉ | 2151/7378 [7:23:10<17:47:57, 12.26s/it] + +{'loss': 0.4733, 'learning_rate': 1.662338692305984e-05, 'epoch': 0.29} + + 29%|██▉ | 2151/7378 [7:23:10<17:47:57, 12.26s/it] + 29%|██▉ | 2152/7378 [7:23:23<17:51:25, 12.30s/it] + +{'loss': 0.4701, 'learning_rate': 1.6620097163161143e-05, 'epoch': 0.29} + + 29%|██▉ | 2152/7378 [7:23:23<17:51:25, 12.30s/it] + 29%|██▉ | 2153/7378 [7:23:35<17:53:41, 12.33s/it] + +{'loss': 0.4659, 'learning_rate': 1.6616806127342472e-05, 'epoch': 0.29} + + 29%|██▉ | 2153/7378 [7:23:35<17:53:41, 12.33s/it] + 29%|██▉ | 2154/7378 [7:23:47<17:53:41, 12.33s/it] + +{'loss': 0.4977, 'learning_rate': 1.661351381623811e-05, 'epoch': 0.29} + + 29%|██▉ | 2154/7378 [7:23:47<17:53:41, 12.33s/it] + 29%|██▉ | 2155/7378 [7:24:00<17:51:22, 12.31s/it] + +{'loss': 0.4934, 'learning_rate': 1.661022023048261e-05, 'epoch': 0.29} + + 29%|██▉ | 2155/7378 [7:24:00<17:51:22, 12.31s/it] + 29%|██▉ | 2156/7378 [7:24:12<17:51:40, 12.31s/it] + +{'loss': 0.4503, 'learning_rate': 1.660692537071075e-05, 'epoch': 0.29} + + 29%|██▉ | 2156/7378 [7:24:12<17:51:40, 12.31s/it] + 29%|██▉ | 2157/7378 [7:24:25<17:58:52, 12.40s/it] + +{'loss': 0.5412, 'learning_rate': 1.6603629237557567e-05, 'epoch': 0.29} + + 29%|██▉ | 2157/7378 [7:24:25<17:58:52, 12.40s/it] + 29%|██▉ | 2158/7378 [7:24:36<17:46:50, 12.26s/it] + +{'loss': 0.439, 'learning_rate': 1.660033183165834e-05, 'epoch': 0.29} + + 29%|██▉ | 2158/7378 [7:24:36<17:46:50, 12.26s/it] + 29%|██▉ | 2159/7378 [7:24:48<17:40:41, 12.19s/it] + +{'loss': 0.5145, 'learning_rate': 1.6597033153648593e-05, 'epoch': 0.29} + + 29%|██▉ | 2159/7378 [7:24:48<17:40:41, 12.19s/it] + 29%|██▉ | 2160/7378 [7:25:01<17:47:42, 12.28s/it] + +{'loss': 0.4969, 'learning_rate': 1.659373320416409e-05, 'epoch': 0.29} + + 29%|██▉ | 2160/7378 [7:25:01<17:47:42, 12.28s/it] + 29%|██▉ | 2161/7378 [7:25:13<17:39:27, 12.18s/it] + +{'loss': 0.515, 'learning_rate': 1.6590431983840845e-05, 'epoch': 0.29} + + 29%|██▉ | 2161/7378 [7:25:13<17:39:27, 12.18s/it] + 29%|██▉ | 2162/7378 [7:25:26<17:54:35, 12.36s/it] + +{'loss': 0.5286, 'learning_rate': 1.658712949331512e-05, 'epoch': 0.29} + + 29%|██▉ | 2162/7378 [7:25:26<17:54:35, 12.36s/it] + 29%|██▉ | 2163/7378 [7:25:38<17:56:58, 12.39s/it] + +{'loss': 0.4823, 'learning_rate': 1.658382573322342e-05, 'epoch': 0.29} + + 29%|██▉ | 2163/7378 [7:25:38<17:56:58, 12.39s/it] + 29%|██▉ | 2164/7378 [7:25:50<17:46:51, 12.28s/it] + +{'loss': 0.4945, 'learning_rate': 1.6580520704202484e-05, 'epoch': 0.29} + + 29%|██▉ | 2164/7378 [7:25:50<17:46:51, 12.28s/it] + 29%|██▉ | 2165/7378 [7:26:02<17:39:16, 12.19s/it] + +{'loss': 0.5242, 'learning_rate': 1.657721440688931e-05, 'epoch': 0.29} + + 29%|██▉ | 2165/7378 [7:26:02<17:39:16, 12.19s/it] + 29%|██▉ | 2166/7378 [7:26:14<17:38:01, 12.18s/it] + +{'loss': 0.4672, 'learning_rate': 1.6573906841921138e-05, 'epoch': 0.29} + + 29%|██▉ | 2166/7378 [7:26:14<17:38:01, 12.18s/it] + 29%|██▉ | 2167/7378 [7:26:27<17:50:26, 12.33s/it] + +{'loss': 0.5396, 'learning_rate': 1.6570598009935447e-05, 'epoch': 0.29} + + 29%|██▉ | 2167/7378 [7:26:27<17:50:26, 12.33s/it] + 29%|██▉ | 2168/7378 [7:26:39<17:53:26, 12.36s/it] + +{'loss': 0.4683, 'learning_rate': 1.6567287911569964e-05, 'epoch': 0.29} + + 29%|██▉ | 2168/7378 [7:26:39<17:53:26, 12.36s/it] + 29%|██▉ | 2169/7378 [7:26:52<17:48:23, 12.31s/it] + +{'loss': 0.4451, 'learning_rate': 1.656397654746265e-05, 'epoch': 0.29} + + 29%|██▉ | 2169/7378 [7:26:52<17:48:23, 12.31s/it] + 29%|██▉ | 2170/7378 [7:27:04<17:55:14, 12.39s/it] + +{'loss': 0.4591, 'learning_rate': 1.656066391825173e-05, 'epoch': 0.29} + + 29%|██▉ | 2170/7378 [7:27:04<17:55:14, 12.39s/it] + 29%|██▉ | 2171/7378 [7:27:17<18:00:05, 12.45s/it] + +{'loss': 0.4863, 'learning_rate': 1.6557350024575656e-05, 'epoch': 0.29} + + 29%|██▉ | 2171/7378 [7:27:17<18:00:05, 12.45s/it] + 29%|██▉ | 2172/7378 [7:27:29<17:52:49, 12.36s/it] + +{'loss': 0.4411, 'learning_rate': 1.6554034867073128e-05, 'epoch': 0.29} + + 29%|██▉ | 2172/7378 [7:27:29<17:52:49, 12.36s/it] + 29%|██▉ | 2173/7378 [7:27:41<17:41:49, 12.24s/it] + +{'loss': 0.424, 'learning_rate': 1.655071844638309e-05, 'epoch': 0.29} + + 29%|██▉ | 2173/7378 [7:27:41<17:41:49, 12.24s/it] + 29%|██▉ | 2174/7378 [7:27:53<17:40:18, 12.22s/it] + +{'loss': 0.4818, 'learning_rate': 1.654740076314474e-05, 'epoch': 0.29} + + 29%|██▉ | 2174/7378 [7:27:53<17:40:18, 12.22s/it] + 29%|██▉ | 2175/7378 [7:28:06<17:46:07, 12.29s/it] + +{'loss': 0.4883, 'learning_rate': 1.6544081817997496e-05, 'epoch': 0.29} + + 29%|██▉ | 2175/7378 [7:28:06<17:46:07, 12.29s/it] + 29%|██▉ | 2176/7378 [7:28:18<17:39:57, 12.23s/it] + +{'loss': 0.4556, 'learning_rate': 1.6540761611581037e-05, 'epoch': 0.29} + + 29%|██��� | 2176/7378 [7:28:18<17:39:57, 12.23s/it] + 30%|██▉ | 2177/7378 [7:28:30<17:35:34, 12.18s/it] + +{'loss': 0.493, 'learning_rate': 1.6537440144535288e-05, 'epoch': 0.3} + + 30%|██▉ | 2177/7378 [7:28:30<17:35:34, 12.18s/it] + 30%|██▉ | 2178/7378 [7:28:42<17:34:05, 12.16s/it] + +{'loss': 0.4375, 'learning_rate': 1.65341174175004e-05, 'epoch': 0.3} + + 30%|██▉ | 2178/7378 [7:28:42<17:34:05, 12.16s/it] + 30%|██▉ | 2179/7378 [7:28:54<17:44:05, 12.28s/it] + +{'loss': 0.4134, 'learning_rate': 1.653079343111678e-05, 'epoch': 0.3} + + 30%|██▉ | 2179/7378 [7:28:54<17:44:05, 12.28s/it] + 30%|██▉ | 2180/7378 [7:29:07<17:44:10, 12.28s/it] + +{'loss': 0.4842, 'learning_rate': 1.6527468186025077e-05, 'epoch': 0.3} + + 30%|██▉ | 2180/7378 [7:29:07<17:44:10, 12.28s/it] + 30%|██▉ | 2181/7378 [7:29:19<17:38:44, 12.22s/it] + +{'loss': 0.4661, 'learning_rate': 1.6524141682866173e-05, 'epoch': 0.3} + + 30%|██▉ | 2181/7378 [7:29:19<17:38:44, 12.22s/it] + 30%|██▉ | 2182/7378 [7:29:31<17:35:58, 12.19s/it] + +{'loss': 0.5133, 'learning_rate': 1.652081392228121e-05, 'epoch': 0.3} + + 30%|██▉ | 2182/7378 [7:29:31<17:35:58, 12.19s/it] + 30%|██▉ | 2183/7378 [7:29:43<17:36:19, 12.20s/it] + +{'loss': 0.5, 'learning_rate': 1.6517484904911554e-05, 'epoch': 0.3} + + 30%|██▉ | 2183/7378 [7:29:43<17:36:19, 12.20s/it] + 30%|██▉ | 2184/7378 [7:29:56<17:48:19, 12.34s/it] + +{'loss': 0.449, 'learning_rate': 1.6514154631398823e-05, 'epoch': 0.3} + + 30%|██▉ | 2184/7378 [7:29:56<17:48:19, 12.34s/it] + 30%|██▉ | 2185/7378 [7:30:08<17:56:04, 12.43s/it] + +{'loss': 0.4336, 'learning_rate': 1.651082310238487e-05, 'epoch': 0.3} + + 30%|██▉ | 2185/7378 [7:30:08<17:56:04, 12.43s/it] + 30%|██▉ | 2186/7378 [7:30:21<17:48:47, 12.35s/it] + +{'loss': 0.5307, 'learning_rate': 1.6507490318511805e-05, 'epoch': 0.3} + + 30%|██▉ | 2186/7378 [7:30:21<17:48:47, 12.35s/it] + 30%|██▉ | 2187/7378 [7:30:33<17:53:59, 12.41s/it] + +{'loss': 0.457, 'learning_rate': 1.6504156280421963e-05, 'epoch': 0.3} + + 30%|██▉ | 2187/7378 [7:30:33<17:53:59, 12.41s/it] + 30%|██▉ | 2188/7378 [7:30:45<17:47:19, 12.34s/it] + +{'loss': 0.4655, 'learning_rate': 1.650082098875793e-05, 'epoch': 0.3} + + 30%|██▉ | 2188/7378 [7:30:45<17:47:19, 12.34s/it] + 30%|██▉ | 2189/7378 [7:30:58<17:48:31, 12.36s/it] + +{'loss': 0.5143, 'learning_rate': 1.6497484444162528e-05, 'epoch': 0.3} + + 30%|██▉ | 2189/7378 [7:30:58<17:48:31, 12.36s/it] + 30%|██▉ | 2190/7378 [7:31:10<17:39:27, 12.25s/it] + +{'loss': 0.4778, 'learning_rate': 1.649414664727883e-05, 'epoch': 0.3} + + 30%|██▉ | 2190/7378 [7:31:10<17:39:27, 12.25s/it] + 30%|██▉ | 2191/7378 [7:31:22<17:46:25, 12.34s/it] + +{'loss': 0.5271, 'learning_rate': 1.6490807598750135e-05, 'epoch': 0.3} + + 30%|██▉ | 2191/7378 [7:31:22<17:46:25, 12.34s/it] + 30%|██▉ | 2192/7378 [7:31:34<17:38:58, 12.25s/it] + +{'loss': 0.5301, 'learning_rate': 1.648746729922e-05, 'epoch': 0.3} + + 30%|██▉ | 2192/7378 [7:31:34<17:38:58, 12.25s/it] + 30%|██▉ | 2193/7378 [7:31:46<17:37:57, 12.24s/it] + +{'loss': 0.4888, 'learning_rate': 1.6484125749332212e-05, 'epoch': 0.3} + + 30%|██▉ | 2193/7378 [7:31:46<17:37:57, 12.24s/it] + 30%|██▉ | 2194/7378 [7:31:59<17:35:41, 12.22s/it] + +{'loss': 0.4587, 'learning_rate': 1.6480782949730804e-05, 'epoch': 0.3} + + 30%|██▉ | 2194/7378 [7:31:59<17:35:41, 12.22s/it] + 30%|██▉ | 2195/7378 [7:32:11<17:38:21, 12.25s/it] + +{'loss': 0.4642, 'learning_rate': 1.6477438901060042e-05, 'epoch': 0.3} + + 30%|██▉ | 2195/7378 [7:32:11<17:38:21, 12.25s/it] + 30%|██▉ | 2196/7378 [7:32:23<17:40:50, 12.28s/it] + +{'loss': 0.4817, 'learning_rate': 1.6474093603964452e-05, 'epoch': 0.3} + + 30%|██▉ | 2196/7378 [7:32:23<17:40:50, 12.28s/it] + 30%|██▉ | 2197/7378 [7:32:36<17:37:50, 12.25s/it] + +{'loss': 0.4757, 'learning_rate': 1.6470747059088774e-05, 'epoch': 0.3} + + 30%|██▉ | 2197/7378 [7:32:36<17:37:50, 12.25s/it] + 30%|██▉ | 2198/7378 [7:32:48<17:35:22, 12.22s/it] + +{'loss': 0.466, 'learning_rate': 1.646739926707801e-05, 'epoch': 0.3} + + 30%|██▉ | 2198/7378 [7:32:48<17:35:22, 12.22s/it] + 30%|██▉ | 2199/7378 [7:33:00<17:43:26, 12.32s/it] + +{'loss': 0.5348, 'learning_rate': 1.6464050228577394e-05, 'epoch': 0.3} + + 30%|██▉ | 2199/7378 [7:33:00<17:43:26, 12.32s/it] + 30%|██▉ | 2200/7378 [7:33:12<17:34:56, 12.22s/it] + +{'loss': 0.4295, 'learning_rate': 1.6460699944232397e-05, 'epoch': 0.3} + + 30%|██▉ | 2200/7378 [7:33:12<17:34:56, 12.22s/it] + 30%|██▉ | 2201/7378 [7:33:24<17:30:25, 12.17s/it] + +{'loss': 0.5007, 'learning_rate': 1.6457348414688737e-05, 'epoch': 0.3} + + 30%|██▉ | 2201/7378 [7:33:24<17:30:25, 12.17s/it] + 30%|██▉ | 2202/7378 [7:33:36<17:28:21, 12.15s/it] + +{'loss': 0.4937, 'learning_rate': 1.6453995640592368e-05, 'epoch': 0.3} + + 30%|██▉ | 2202/7378 [7:33:36<17:28:21, 12.15s/it] + 30%|██▉ | 2203/7378 [7:33:49<17:30:23, 12.18s/it] + +{'loss': 0.4734, 'learning_rate': 1.6450641622589484e-05, 'epoch': 0.3} + + 30%|██▉ | 2203/7378 [7:33:49<17:30:23, 12.18s/it] + 30%|██▉ | 2204/7378 [7:34:01<17:33:27, 12.22s/it] + +{'loss': 0.443, 'learning_rate': 1.644728636132652e-05, 'epoch': 0.3} + + 30%|██▉ | 2204/7378 [7:34:01<17:33:27, 12.22s/it] + 30%|██▉ | 2205/7378 [7:34:13<17:36:52, 12.26s/it] + +{'loss': 0.5241, 'learning_rate': 1.644392985745015e-05, 'epoch': 0.3} + + 30%|██▉ | 2205/7378 [7:34:13<17:36:52, 12.26s/it] + 30%|██▉ | 2206/7378 [7:34:26<17:51:10, 12.43s/it] + +{'loss': 0.4779, 'learning_rate': 1.6440572111607287e-05, 'epoch': 0.3} + + 30%|██▉ | 2206/7378 [7:34:26<17:51:10, 12.43s/it] + 30%|██▉ | 2207/7378 [7:34:38<17:44:19, 12.35s/it] + +{'loss': 0.5099, 'learning_rate': 1.6437213124445082e-05, 'epoch': 0.3} + + 30%|██▉ | 2207/7378 [7:34:38<17:44:19, 12.35s/it] + 30%|██▉ | 2208/7378 [7:34:51<17:43:53, 12.35s/it] + +{'loss': 0.4848, 'learning_rate': 1.643385289661093e-05, 'epoch': 0.3} + + 30%|██▉ | 2208/7378 [7:34:51<17:43:53, 12.35s/it] + 30%|██▉ | 2209/7378 [7:35:03<17:45:50, 12.37s/it] + +{'loss': 0.4504, 'learning_rate': 1.6430491428752465e-05, 'epoch': 0.3} + + 30%|██▉ | 2209/7378 [7:35:03<17:45:50, 12.37s/it] + 30%|██▉ | 2210/7378 [7:35:15<17:36:37, 12.27s/it] + +{'loss': 0.4757, 'learning_rate': 1.642712872151755e-05, 'epoch': 0.3} + + 30%|██▉ | 2210/7378 [7:35:15<17:36:37, 12.27s/it] + 30%|██▉ | 2211/7378 [7:35:28<17:42:21, 12.34s/it] + +{'loss': 0.4102, 'learning_rate': 1.6423764775554302e-05, 'epoch': 0.3} + + 30%|██▉ | 2211/7378 [7:35:28<17:42:21, 12.34s/it] + 30%|██▉ | 2212/7378 [7:35:40<17:42:53, 12.34s/it] + +{'loss': 0.4826, 'learning_rate': 1.642039959151106e-05, 'epoch': 0.3} + + 30%|██▉ | 2212/7378 [7:35:40<17:42:53, 12.34s/it] + 30%|██▉ | 2213/7378 [7:35:52<17:46:17, 12.39s/it] + +{'loss': 0.4948, 'learning_rate': 1.641703317003642e-05, 'epoch': 0.3} + + 30%|██▉ | 2213/7378 [7:35:52<17:46:17, 12.39s/it] + 30%|███ | 2214/7378 [7:36:05<17:42:06, 12.34s/it] + +{'loss': 0.4553, 'learning_rate': 1.6413665511779197e-05, 'epoch': 0.3} + + 30%|███ | 2214/7378 [7:36:05<17:42:06, 12.34s/it] + 30%|███ | 2215/7378 [7:36:17<17:39:19, 12.31s/it] + +{'loss': 0.4199, 'learning_rate': 1.641029661738846e-05, 'epoch': 0.3} + + 30%|███ | 2215/7378 [7:36:17<17:39:19, 12.31s/it] + 30%|███ | 2216/7378 [7:36:29<17:40:29, 12.33s/it] + +{'loss': 0.4889, 'learning_rate': 1.6406926487513514e-05, 'epoch': 0.3} + + 30%|███ | 2216/7378 [7:36:29<17:40:29, 12.33s/it] + 30%|███ | 2217/7378 [7:36:41<17:31:42, 12.23s/it] + +{'loss': 0.5072, 'learning_rate': 1.6403555122803894e-05, 'epoch': 0.3} + + 30%|███ | 2217/7378 [7:36:41<17:31:42, 12.23s/it] + 30%|███ | 2218/7378 [7:36:54<17:40:03, 12.33s/it] + +{'loss': 0.476, 'learning_rate': 1.640018252390938e-05, 'epoch': 0.3} + + 30%|███ | 2218/7378 [7:36:54<17:40:03, 12.33s/it] + 30%|███ | 2219/7378 [7:37:07<17:53:06, 12.48s/it] + +{'loss': 0.4854, 'learning_rate': 1.6396808691479982e-05, 'epoch': 0.3} + + 30%|███ | 2219/7378 [7:37:07<17:53:06, 12.48s/it] + 30%|███ | 2220/7378 [7:37:19<17:47:24, 12.42s/it] + +{'loss': 0.4163, 'learning_rate': 1.6393433626165957e-05, 'epoch': 0.3} + + 30%|███ | 2220/7378 [7:37:19<17:47:24, 12.42s/it] + 30%|███ | 2221/7378 [7:37:31<17:46:53, 12.41s/it] + +{'loss': 0.4359, 'learning_rate': 1.6390057328617802e-05, 'epoch': 0.3} + + 30%|███ | 2221/7378 [7:37:31<17:46:53, 12.41s/it] + 30%|███ | 2222/7378 [7:37:44<17:43:02, 12.37s/it] + +{'loss': 0.488, 'learning_rate': 1.6386679799486236e-05, 'epoch': 0.3} + + 30%|███ | 2222/7378 [7:37:44<17:43:02, 12.37s/it] + 30%|███ | 2223/7378 [7:37:56<17:36:59, 12.30s/it] + +{'loss': 0.4231, 'learning_rate': 1.6383301039422234e-05, 'epoch': 0.3} + + 30%|███ | 2223/7378 [7:37:56<17:36:59, 12.30s/it] + 30%|███ | 2224/7378 [7:38:08<17:44:06, 12.39s/it] + +{'loss': 0.4695, 'learning_rate': 1.6379921049076987e-05, 'epoch': 0.3} + + 30%|███ | 2224/7378 [7:38:08<17:44:06, 12.39s/it] + 30%|███ | 2225/7378 [7:38:21<17:39:04, 12.33s/it] + +{'loss': 0.5003, 'learning_rate': 1.6376539829101946e-05, 'epoch': 0.3} + + 30%|███ | 2225/7378 [7:38:21<17:39:04, 12.33s/it] + 30%|███ | 2226/7378 [7:38:33<17:47:00, 12.43s/it] + +{'loss': 0.5203, 'learning_rate': 1.6373157380148783e-05, 'epoch': 0.3} + + 30%|███ | 2226/7378 [7:38:33<17:47:00, 12.43s/it] + 30%|███ | 2227/7378 [7:38:46<17:48:15, 12.44s/it] + +{'loss': 0.4411, 'learning_rate': 1.636977370286941e-05, 'epoch': 0.3} + + 30%|███ | 2227/7378 [7:38:46<17:48:15, 12.44s/it] + 30%|███ | 2228/7378 [7:38:58<17:42:34, 12.38s/it] + +{'loss': 0.4674, 'learning_rate': 1.6366388797915987e-05, 'epoch': 0.3} + + 30%|███ | 2228/7378 [7:38:58<17:42:34, 12.38s/it] + 30%|███ | 2229/7378 [7:39:10<17:36:19, 12.31s/it] + +{'loss': 0.4311, 'learning_rate': 1.636300266594089e-05, 'epoch': 0.3} + + 30%|███ | 2229/7378 [7:39:10<17:36:19, 12.31s/it] + 30%|███ | 2230/7378 [7:39:23<17:46:23, 12.43s/it] + +{'loss': 0.4601, 'learning_rate': 1.635961530759675e-05, 'epoch': 0.3} + + 30%|███ | 2230/7378 [7:39:23<17:46:23, 12.43s/it] + 30%|███ | 2231/7378 [7:39:35<17:46:16, 12.43s/it] + +{'loss': 0.4697, 'learning_rate': 1.6356226723536427e-05, 'epoch': 0.3} + + 30%|███ | 2231/7378 [7:39:35<17:46:16, 12.43s/it] + 30%|███ | 2232/7378 [7:39:48<17:47:06, 12.44s/it] + +{'loss': 0.4926, 'learning_rate': 1.6352836914413014e-05, 'epoch': 0.3} + + 30%|███ | 2232/7378 [7:39:48<17:47:06, 12.44s/it] + 30%|███ | 2233/7378 [7:40:00<17:50:08, 12.48s/it] + +{'loss': 0.5386, 'learning_rate': 1.6349445880879848e-05, 'epoch': 0.3} + + 30%|███ | 2233/7378 [7:40:00<17:50:08, 12.48s/it] + 30%|███ | 2234/7378 [7:40:13<17:48:14, 12.46s/it] + +{'loss': 0.4666, 'learning_rate': 1.634605362359049e-05, 'epoch': 0.3} + + 30%|███ | 2234/7378 [7:40:13<17:48:14, 12.46s/it] + 30%|███ | 2235/7378 [7:40:25<17:41:13, 12.38s/it] + +{'loss': 0.5285, 'learning_rate': 1.6342660143198756e-05, 'epoch': 0.3} + + 30%|███ | 2235/7378 [7:40:25<17:41:13, 12.38s/it] + 30%|███ | 2236/7378 [7:40:37<17:46:58, 12.45s/it] + +{'loss': 0.4767, 'learning_rate': 1.6339265440358676e-05, 'epoch': 0.3} + + 30%|███ | 2236/7378 [7:40:37<17:46:58, 12.45s/it] + 30%|███ | 2237/7378 [7:40:50<17:42:39, 12.40s/it] + +{'loss': 0.4673, 'learning_rate': 1.633586951572453e-05, 'epoch': 0.3} + + 30%|███ | 2237/7378 [7:40:50<17:42:39, 12.40s/it] + 30%|███ | 2238/7378 [7:41:02<17:32:52, 12.29s/it] + +{'loss': 0.4844, 'learning_rate': 1.6332472369950828e-05, 'epoch': 0.3} + + 30%|███ | 2238/7378 [7:41:02<17:32:52, 12.29s/it] + 30%|███ | 2239/7378 [7:41:14<17:30:27, 12.26s/it] + +{'loss': 0.4263, 'learning_rate': 1.632907400369232e-05, 'epoch': 0.3} + + 30%|███ | 2239/7378 [7:41:14<17:30:27, 12.26s/it] + 30%|███ | 2240/7378 [7:41:26<17:30:26, 12.27s/it] + +{'loss': 0.4411, 'learning_rate': 1.632567441760398e-05, 'epoch': 0.3} + + 30%|███ | 2240/7378 [7:41:26<17:30:26, 12.27s/it] + 30%|███ | 2241/7378 [7:41:38<17:26:37, 12.22s/it] + +{'loss': 0.4328, 'learning_rate': 1.6322273612341033e-05, 'epoch': 0.3} + + 30%|███ | 2241/7378 [7:41:38<17:26:37, 12.22s/it] + 30%|███ | 2242/7378 [7:41:50<17:23:57, 12.20s/it] + +{'loss': 0.4857, 'learning_rate': 1.631887158855893e-05, 'epoch': 0.3} + + 30%|███ | 2242/7378 [7:41:50<17:23:57, 12.20s/it] + 30%|███ | 2243/7378 [7:42:03<17:26:47, 12.23s/it] + +{'loss': 0.502, 'learning_rate': 1.631546834691335e-05, 'epoch': 0.3} + + 30%|███ | 2243/7378 [7:42:03<17:26:47, 12.23s/it] + 30%|███ | 2244/7378 [7:42:15<17:28:43, 12.26s/it] + +{'loss': 0.4998, 'learning_rate': 1.6312063888060226e-05, 'epoch': 0.3} + + 30%|███ | 2244/7378 [7:42:15<17:28:43, 12.26s/it] + 30%|███ | 2245/7378 [7:42:27<17:26:47, 12.24s/it] + +{'loss': 0.4743, 'learning_rate': 1.6308658212655706e-05, 'epoch': 0.3} + + 30%|███ | 2245/7378 [7:42:27<17:26:47, 12.24s/it] + 30%|███ | 2246/7378 [7:42:40<17:33:30, 12.32s/it] + +{'loss': 0.485, 'learning_rate': 1.6305251321356183e-05, 'epoch': 0.3} + + 30%|███ | 2246/7378 [7:42:40<17:33:30, 12.32s/it] + 30%|███ | 2247/7378 [7:42:52<17:34:05, 12.33s/it] + +{'loss': 0.4919, 'learning_rate': 1.6301843214818284e-05, 'epoch': 0.3} + + 30%|███ | 2247/7378 [7:42:52<17:34:05, 12.33s/it] + 30%|███ | 2248/7378 [7:43:05<17:35:09, 12.34s/it] + +{'loss': 0.555, 'learning_rate': 1.6298433893698862e-05, 'epoch': 0.3} + + 30%|███ | 2248/7378 [7:43:05<17:35:09, 12.34s/it] + 30%|███ | 2249/7378 [7:43:17<17:41:30, 12.42s/it] + +{'loss': 0.4952, 'learning_rate': 1.6295023358655016e-05, 'epoch': 0.3} + + 30%|███ | 2249/7378 [7:43:17<17:41:30, 12.42s/it] + 30%|███ | 2250/7378 [7:43:29<17:31:56, 12.31s/it] + +{'loss': 0.4731, 'learning_rate': 1.6291611610344073e-05, 'epoch': 0.3} + + 30%|███ | 2250/7378 [7:43:29<17:31:56, 12.31s/it] + 31%|███ | 2251/7378 [7:43:41<17:31:29, 12.31s/it] + +{'loss': 0.4914, 'learning_rate': 1.6288198649423588e-05, 'epoch': 0.31} + + 31%|███ | 2251/7378 [7:43:41<17:31:29, 12.31s/it] + 31%|███ | 2252/7378 [7:43:54<17:28:16, 12.27s/it] + +{'loss': 0.4839, 'learning_rate': 1.6284784476551365e-05, 'epoch': 0.31} + + 31%|███ | 2252/7378 [7:43:54<17:28:16, 12.27s/it] + 31%|███ | 2253/7378 [7:44:06<17:20:02, 12.18s/it] + +{'loss': 0.46, 'learning_rate': 1.6281369092385424e-05, 'epoch': 0.31} + + 31%|███ | 2253/7378 [7:44:06<17:20:02, 12.18s/it] + 31%|███ | 2254/7378 [7:44:18<17:31:33, 12.31s/it] + +{'loss': 0.4811, 'learning_rate': 1.6277952497584027e-05, 'epoch': 0.31} + + 31%|███ | 2254/7378 [7:44:18<17:31:33, 12.31s/it] + 31%|███ | 2255/7378 [7:44:30<17:30:09, 12.30s/it] + +{'loss': 0.4497, 'learning_rate': 1.627453469280568e-05, 'epoch': 0.31} + + 31%|███ | 2255/7378 [7:44:31<17:30:09, 12.30s/it] + 31%|███ | 2256/7378 [7:44:43<17:45:24, 12.48s/it] + +{'loss': 0.4319, 'learning_rate': 1.6271115678709098e-05, 'epoch': 0.31} + + 31%|███ | 2256/7378 [7:44:43<17:45:24, 12.48s/it] + 31%|███ | 2257/7378 [7:44:56<17:41:21, 12.44s/it] + +{'loss': 0.4221, 'learning_rate': 1.626769545595325e-05, 'epoch': 0.31} + + 31%|███ | 2257/7378 [7:44:56<17:41:21, 12.44s/it] + 31%|███ | 2258/7378 [7:45:08<17:48:04, 12.52s/it] + +{'loss': 0.4506, 'learning_rate': 1.6264274025197328e-05, 'epoch': 0.31} + + 31%|███ | 2258/7378 [7:45:08<17:48:04, 12.52s/it] + 31%|███ | 2259/7378 [7:45:21<17:36:20, 12.38s/it] + +{'loss': 0.4754, 'learning_rate': 1.626085138710076e-05, 'epoch': 0.31} + + 31%|███ | 2259/7378 [7:45:21<17:36:20, 12.38s/it] + 31%|███ | 2260/7378 [7:45:33<17:29:22, 12.30s/it] + +{'loss': 0.4459, 'learning_rate': 1.6257427542323204e-05, 'epoch': 0.31} + + 31%|███ | 2260/7378 [7:45:33<17:29:22, 12.30s/it] + 31%|███ | 2261/7378 [7:45:45<17:30:21, 12.32s/it] + +{'loss': 0.4855, 'learning_rate': 1.6254002491524555e-05, 'epoch': 0.31} + + 31%|███ | 2261/7378 [7:45:45<17:30:21, 12.32s/it] + 31%|███ | 2262/7378 [7:45:57<17:24:07, 12.25s/it] + +{'loss': 0.4685, 'learning_rate': 1.6250576235364938e-05, 'epoch': 0.31} + + 31%|███ | 2262/7378 [7:45:57<17:24:07, 12.25s/it] + 31%|███ | 2263/7378 [7:46:09<17:28:08, 12.29s/it] + +{'loss': 0.4535, 'learning_rate': 1.6247148774504705e-05, 'epoch': 0.31} + + 31%|███ | 2263/7378 [7:46:09<17:28:08, 12.29s/it] + 31%|███ | 2264/7378 [7:46:22<17:24:32, 12.26s/it] + +{'loss': 0.524, 'learning_rate': 1.6243720109604447e-05, 'epoch': 0.31} + + 31%|███ | 2264/7378 [7:46:22<17:24:32, 12.26s/it] + 31%|███ | 2265/7378 [7:46:34<17:25:16, 12.27s/it] + +{'loss': 0.4622, 'learning_rate': 1.6240290241324993e-05, 'epoch': 0.31} + + 31%|███ | 2265/7378 [7:46:34<17:25:16, 12.27s/it] + 31%|███ | 2266/7378 [7:46:46<17:22:52, 12.24s/it] + +{'loss': 0.4819, 'learning_rate': 1.6236859170327394e-05, 'epoch': 0.31} + + 31%|███ | 2266/7378 [7:46:46<17:22:52, 12.24s/it] + 31%|███ | 2267/7378 [7:46:58<17:26:46, 12.29s/it] + +{'loss': 0.4961, 'learning_rate': 1.6233426897272925e-05, 'epoch': 0.31} + + 31%|███ | 2267/7378 [7:46:59<17:26:46, 12.29s/it] + 31%|███ | 2268/7378 [7:47:11<17:32:46, 12.36s/it] + +{'loss': 0.476, 'learning_rate': 1.6229993422823112e-05, 'epoch': 0.31} + + 31%|███ | 2268/7378 [7:47:11<17:32:46, 12.36s/it] + 31%|███ | 2269/7378 [7:47:24<17:45:37, 12.51s/it] + +{'loss': 0.4694, 'learning_rate': 1.6226558747639702e-05, 'epoch': 0.31} + + 31%|███ | 2269/7378 [7:47:24<17:45:37, 12.51s/it] + 31%|███ | 2270/7378 [7:47:37<17:50:43, 12.58s/it] + +{'loss': 0.5431, 'learning_rate': 1.6223122872384675e-05, 'epoch': 0.31} + + 31%|███ | 2270/7378 [7:47:37<17:50:43, 12.58s/it] + 31%|███ | 2271/7378 [7:47:49<17:42:48, 12.49s/it] + +{'loss': 0.525, 'learning_rate': 1.6219685797720236e-05, 'epoch': 0.31} + + 31%|███ | 2271/7378 [7:47:49<17:42:48, 12.49s/it] + 31%|███ | 2272/7378 [7:48:01<17:42:16, 12.48s/it] + +{'loss': 0.4587, 'learning_rate': 1.6216247524308838e-05, 'epoch': 0.31} + + 31%|███ | 2272/7378 [7:48:01<17:42:16, 12.48s/it] + 31%|███ | 2273/7378 [7:48:14<17:37:00, 12.42s/it] + +{'loss': 0.4746, 'learning_rate': 1.6212808052813148e-05, 'epoch': 0.31} + + 31%|███ | 2273/7378 [7:48:14<17:37:00, 12.42s/it] + 31%|███ | 2274/7378 [7:48:26<17:31:49, 12.36s/it] + +{'loss': 0.5035, 'learning_rate': 1.6209367383896075e-05, 'epoch': 0.31} + + 31%|███ | 2274/7378 [7:48:26<17:31:49, 12.36s/it] + 31%|███ | 2275/7378 [7:48:38<17:34:47, 12.40s/it] + +{'loss': 0.505, 'learning_rate': 1.6205925518220742e-05, 'epoch': 0.31} + + 31%|███ | 2275/7378 [7:48:38<17:34:47, 12.40s/it] + 31%|███ | 2276/7378 [7:48:51<17:43:38, 12.51s/it] + +{'loss': 0.4612, 'learning_rate': 1.6202482456450524e-05, 'epoch': 0.31} + + 31%|███ | 2276/7378 [7:48:51<17:43:38, 12.51s/it] + 31%|███ | 2277/7378 [7:49:04<17:48:13, 12.56s/it] + +{'loss': 0.4813, 'learning_rate': 1.6199038199249023e-05, 'epoch': 0.31} + + 31%|███ | 2277/7378 [7:49:04<17:48:13, 12.56s/it] + 31%|███ | 2278/7378 [7:49:16<17:32:49, 12.39s/it] + +{'loss': 0.4602, 'learning_rate': 1.619559274728005e-05, 'epoch': 0.31} + + 31%|███ | 2278/7378 [7:49:16<17:32:49, 12.39s/it] + 31%|███ | 2279/7378 [7:49:28<17:36:59, 12.44s/it] + +{'loss': 0.5004, 'learning_rate': 1.6192146101207674e-05, 'epoch': 0.31} + + 31%|███ | 2279/7378 [7:49:28<17:36:59, 12.44s/it] + 31%|███ | 2280/7378 [7:49:40<17:22:42, 12.27s/it] + +{'loss': 0.4502, 'learning_rate': 1.618869826169618e-05, 'epoch': 0.31} + + 31%|███ | 2280/7378 [7:49:40<17:22:42, 12.27s/it] + 31%|███ | 2281/7378 [7:49:53<17:26:22, 12.32s/it] + +{'loss': 0.516, 'learning_rate': 1.618524922941008e-05, 'epoch': 0.31} + + 31%|███ | 2281/7378 [7:49:53<17:26:22, 12.32s/it] + 31%|███ | 2282/7378 [7:50:05<17:27:46, 12.34s/it] + +{'loss': 0.4863, 'learning_rate': 1.618179900501413e-05, 'epoch': 0.31} + + 31%|███ | 2282/7378 [7:50:05<17:27:46, 12.34s/it] + 31%|███ | 2283/7378 [7:50:17<17:28:07, 12.34s/it] + +{'loss': 0.523, 'learning_rate': 1.6178347589173298e-05, 'epoch': 0.31} + + 31%|███ | 2283/7378 [7:50:17<17:28:07, 12.34s/it] + 31%|███ | 2284/7378 [7:50:30<17:31:15, 12.38s/it] + +{'loss': 0.4696, 'learning_rate': 1.6174894982552788e-05, 'epoch': 0.31} + + 31%|███ | 2284/7378 [7:50:30<17:31:15, 12.38s/it] + 31%|███ | 2285/7378 [7:50:42<17:26:07, 12.32s/it] + +{'loss': 0.465, 'learning_rate': 1.6171441185818047e-05, 'epoch': 0.31} + + 31%|███ | 2285/7378 [7:50:42<17:26:07, 12.32s/it] + 31%|███ | 2286/7378 [7:50:55<17:34:18, 12.42s/it] + +{'loss': 0.4681, 'learning_rate': 1.6167986199634732e-05, 'epoch': 0.31} + + 31%|███ | 2286/7378 [7:50:55<17:34:18, 12.42s/it] + 31%|███ | 2287/7378 [7:51:07<17:30:15, 12.38s/it] + +{'loss': 0.4351, 'learning_rate': 1.6164530024668743e-05, 'epoch': 0.31} + + 31%|███ | 2287/7378 [7:51:07<17:30:15, 12.38s/it] + 31%|███ | 2288/7378 [7:51:19<17:32:17, 12.40s/it] + +{'loss': 0.5107, 'learning_rate': 1.61610726615862e-05, 'epoch': 0.31} + + 31%|███ | 2288/7378 [7:51:19<17:32:17, 12.40s/it] + 31%|███ | 2289/7378 [7:51:32<17:37:20, 12.47s/it] + +{'loss': 0.4869, 'learning_rate': 1.6157614111053454e-05, 'epoch': 0.31} + + 31%|███ | 2289/7378 [7:51:32<17:37:20, 12.47s/it] + 31%|███ | 2290/7378 [7:51:44<17:34:45, 12.44s/it] + +{'loss': 0.4207, 'learning_rate': 1.615415437373709e-05, 'epoch': 0.31} + + 31%|███ | 2290/7378 [7:51:44<17:34:45, 12.44s/it] + 31%|███ | 2291/7378 [7:51:57<17:34:38, 12.44s/it] + +{'loss': 0.4976, 'learning_rate': 1.6150693450303913e-05, 'epoch': 0.31} + + 31%|███ | 2291/7378 [7:51:57<17:34:38, 12.44s/it] + 31%|███ | 2292/7378 [7:52:09<17:31:06, 12.40s/it] + +{'loss': 0.478, 'learning_rate': 1.6147231341420968e-05, 'epoch': 0.31} + + 31%|███ | 2292/7378 [7:52:09<17:31:06, 12.40s/it] + 31%|███ | 2293/7378 [7:52:21<17:24:08, 12.32s/it] + +{'loss': 0.4704, 'learning_rate': 1.614376804775552e-05, 'epoch': 0.31} + + 31%|███ | 2293/7378 [7:52:21<17:24:08, 12.32s/it] + 31%|███ | 2294/7378 [7:52:34<17:27:54, 12.37s/it] + +{'loss': 0.4716, 'learning_rate': 1.6140303569975064e-05, 'epoch': 0.31} + + 31%|███ | 2294/7378 [7:52:34<17:27:54, 12.37s/it] + 31%|███ | 2295/7378 [7:52:46<17:22:27, 12.31s/it] + +{'loss': 0.5206, 'learning_rate': 1.613683790874732e-05, 'epoch': 0.31} + + 31%|███ | 2295/7378 [7:52:46<17:22:27, 12.31s/it] + 31%|███ | 2296/7378 [7:52:58<17:27:36, 12.37s/it] + +{'loss': 0.5192, 'learning_rate': 1.6133371064740247e-05, 'epoch': 0.31} + + 31%|███ | 2296/7378 [7:52:59<17:27:36, 12.37s/it] + 31%|███ | 2297/7378 [7:53:11<17:32:00, 12.42s/it] + +{'loss': 0.4464, 'learning_rate': 1.612990303862202e-05, 'epoch': 0.31} + + 31%|███ | 2297/7378 [7:53:11<17:32:00, 12.42s/it] + 31%|███ | 2298/7378 [7:53:23<17:27:01, 12.37s/it] + +{'loss': 0.5118, 'learning_rate': 1.6126433831061052e-05, 'epoch': 0.31} + + 31%|█���█ | 2298/7378 [7:53:23<17:27:01, 12.37s/it] + 31%|███ | 2299/7378 [7:53:36<17:25:29, 12.35s/it] + +{'loss': 0.4573, 'learning_rate': 1.612296344272597e-05, 'epoch': 0.31} + + 31%|███ | 2299/7378 [7:53:36<17:25:29, 12.35s/it] + 31%|███ | 2300/7378 [7:53:48<17:14:54, 12.23s/it] + +{'loss': 0.4689, 'learning_rate': 1.6119491874285645e-05, 'epoch': 0.31} + + 31%|███ | 2300/7378 [7:53:48<17:14:54, 12.23s/it] + 31%|███ | 2301/7378 [7:54:00<17:24:46, 12.35s/it] + +{'loss': 0.4741, 'learning_rate': 1.6116019126409162e-05, 'epoch': 0.31} + + 31%|███ | 2301/7378 [7:54:00<17:24:46, 12.35s/it] + 31%|███ | 2302/7378 [7:54:13<17:30:41, 12.42s/it] + +{'loss': 0.5153, 'learning_rate': 1.6112545199765844e-05, 'epoch': 0.31} + + 31%|███ | 2302/7378 [7:54:13<17:30:41, 12.42s/it] + 31%|███ | 2303/7378 [7:54:25<17:25:31, 12.36s/it] + +{'loss': 0.468, 'learning_rate': 1.610907009502523e-05, 'epoch': 0.31} + + 31%|███ | 2303/7378 [7:54:25<17:25:31, 12.36s/it] + 31%|███ | 2304/7378 [7:54:37<17:23:53, 12.34s/it] + +{'loss': 0.4505, 'learning_rate': 1.6105593812857097e-05, 'epoch': 0.31} + + 31%|███ | 2304/7378 [7:54:37<17:23:53, 12.34s/it] + 31%|███ | 2305/7378 [7:54:50<17:23:44, 12.34s/it] + +{'loss': 0.4192, 'learning_rate': 1.610211635393144e-05, 'epoch': 0.31} + + 31%|███ | 2305/7378 [7:54:50<17:23:44, 12.34s/it] + 31%|███▏ | 2306/7378 [7:55:03<17:41:13, 12.55s/it] + +{'loss': 0.4665, 'learning_rate': 1.6098637718918482e-05, 'epoch': 0.31} + + 31%|███▏ | 2306/7378 [7:55:03<17:41:13, 12.55s/it] + 31%|███▏ | 2307/7378 [7:55:15<17:40:27, 12.55s/it] + +{'loss': 0.5265, 'learning_rate': 1.6095157908488685e-05, 'epoch': 0.31} + + 31%|███▏ | 2307/7378 [7:55:15<17:40:27, 12.55s/it] + 31%|███▏ | 2308/7378 [7:55:28<17:35:18, 12.49s/it] + +{'loss': 0.4265, 'learning_rate': 1.609167692331272e-05, 'epoch': 0.31} + + 31%|███▏ | 2308/7378 [7:55:28<17:35:18, 12.49s/it] + 31%|███▏ | 2309/7378 [7:55:40<17:40:55, 12.56s/it] + +{'loss': 0.4962, 'learning_rate': 1.60881947640615e-05, 'epoch': 0.31} + + 31%|███▏ | 2309/7378 [7:55:40<17:40:55, 12.56s/it] + 31%|███▏ | 2310/7378 [7:55:53<17:35:58, 12.50s/it] + +{'loss': 0.5579, 'learning_rate': 1.6084711431406144e-05, 'epoch': 0.31} + + 31%|███▏ | 2310/7378 [7:55:53<17:35:58, 12.50s/it] + 31%|███▏ | 2311/7378 [7:56:05<17:26:33, 12.39s/it] + +{'loss': 0.503, 'learning_rate': 1.608122692601802e-05, 'epoch': 0.31} + + 31%|███▏ | 2311/7378 [7:56:05<17:26:33, 12.39s/it] + 31%|███▏ | 2312/7378 [7:56:17<17:26:49, 12.40s/it] + +{'loss': 0.4675, 'learning_rate': 1.6077741248568712e-05, 'epoch': 0.31} + + 31%|███▏ | 2312/7378 [7:56:17<17:26:49, 12.40s/it] + 31%|███▏ | 2313/7378 [7:56:30<17:25:46, 12.39s/it] + +{'loss': 0.4742, 'learning_rate': 1.6074254399730024e-05, 'epoch': 0.31} + + 31%|███▏ | 2313/7378 [7:56:30<17:25:46, 12.39s/it] + 31%|███▏ | 2314/7378 [7:56:42<17:21:43, 12.34s/it] + +{'loss': 0.4903, 'learning_rate': 1.6070766380173997e-05, 'epoch': 0.31} + + 31%|███▏ | 2314/7378 [7:56:42<17:21:43, 12.34s/it] + 31%|███▏ | 2315/7378 [7:56:54<17:27:20, 12.41s/it] + +{'loss': 0.4765, 'learning_rate': 1.6067277190572887e-05, 'epoch': 0.31} + + 31%|███▏ | 2315/7378 [7:56:54<17:27:20, 12.41s/it] + 31%|███▏ | 2316/7378 [7:57:07<17:26:29, 12.40s/it] + +{'loss': 0.5085, 'learning_rate': 1.6063786831599186e-05, 'epoch': 0.31} + + 31%|███▏ | 2316/7378 [7:57:07<17:26:29, 12.40s/it] + 31%|███▏ | 2317/7378 [7:57:19<17:22:54, 12.36s/it] + +{'loss': 0.5044, 'learning_rate': 1.60602953039256e-05, 'epoch': 0.31} + + 31%|███▏ | 2317/7378 [7:57:19<17:22:54, 12.36s/it] + 31%|███▏ | 2318/7378 [7:57:31<17:24:41, 12.39s/it] + +{'loss': 0.4372, 'learning_rate': 1.605680260822507e-05, 'epoch': 0.31} + + 31%|███▏ | 2318/7378 [7:57:31<17:24:41, 12.39s/it] + 31%|███▏ | 2319/7378 [7:57:44<17:35:20, 12.52s/it] + +{'loss': 0.4489, 'learning_rate': 1.6053308745170757e-05, 'epoch': 0.31} + + 31%|███▏ | 2319/7378 [7:57:44<17:35:20, 12.52s/it] + 31%|███▏ | 2320/7378 [7:57:56<17:28:04, 12.43s/it] + +{'loss': 0.4809, 'learning_rate': 1.6049813715436047e-05, 'epoch': 0.31} + + 31%|███▏ | 2320/7378 [7:57:57<17:28:04, 12.43s/it] + 31%|███▏ | 2321/7378 [7:58:09<17:17:05, 12.30s/it] + +{'loss': 0.4856, 'learning_rate': 1.604631751969456e-05, 'epoch': 0.31} + + 31%|███▏ | 2321/7378 [7:58:09<17:17:05, 12.30s/it] + 31%|███▏ | 2322/7378 [7:58:20<17:08:34, 12.21s/it] + +{'loss': 0.4831, 'learning_rate': 1.6042820158620123e-05, 'epoch': 0.31} + + 31%|██���▏ | 2322/7378 [7:58:20<17:08:34, 12.21s/it] + 31%|███▏ | 2323/7378 [7:58:33<17:26:19, 12.42s/it] + +{'loss': 0.4513, 'learning_rate': 1.60393216328868e-05, 'epoch': 0.31} + + 31%|███▏ | 2323/7378 [7:58:33<17:26:19, 12.42s/it] + 31%|███▏ | 2324/7378 [7:58:46<17:24:59, 12.41s/it] + +{'loss': 0.5194, 'learning_rate': 1.6035821943168883e-05, 'epoch': 0.31} + + 31%|███▏ | 2324/7378 [7:58:46<17:24:59, 12.41s/it] + 32%|███▏ | 2325/7378 [7:58:58<17:19:09, 12.34s/it] + +{'loss': 0.5092, 'learning_rate': 1.603232109014088e-05, 'epoch': 0.32} + + 32%|███▏ | 2325/7378 [7:58:58<17:19:09, 12.34s/it] + 32%|███▏ | 2326/7378 [7:59:10<17:16:19, 12.31s/it] + +{'loss': 0.4812, 'learning_rate': 1.6028819074477517e-05, 'epoch': 0.32} + + 32%|███▏ | 2326/7378 [7:59:10<17:16:19, 12.31s/it] + 32%|███▏ | 2327/7378 [7:59:22<17:14:46, 12.29s/it] + +{'loss': 0.4574, 'learning_rate': 1.602531589685376e-05, 'epoch': 0.32} + + 32%|███▏ | 2327/7378 [7:59:22<17:14:46, 12.29s/it] + 32%|███▏ | 2328/7378 [7:59:35<17:13:17, 12.28s/it] + +{'loss': 0.5488, 'learning_rate': 1.6021811557944793e-05, 'epoch': 0.32} + + 32%|███▏ | 2328/7378 [7:59:35<17:13:17, 12.28s/it] + 32%|███▏ | 2329/7378 [7:59:47<17:12:54, 12.27s/it] + +{'loss': 0.4791, 'learning_rate': 1.601830605842602e-05, 'epoch': 0.32} + + 32%|███▏ | 2329/7378 [7:59:47<17:12:54, 12.27s/it] + 32%|███▏ | 2330/7378 [8:00:00<17:20:29, 12.37s/it] + +{'loss': 0.4608, 'learning_rate': 1.6014799398973072e-05, 'epoch': 0.32} + + 32%|███▏ | 2330/7378 [8:00:00<17:20:29, 12.37s/it] + 32%|███▏ | 2331/7378 [8:00:12<17:14:10, 12.29s/it] + +{'loss': 0.3793, 'learning_rate': 1.60112915802618e-05, 'epoch': 0.32} + + 32%|███▏ | 2331/7378 [8:00:12<17:14:10, 12.29s/it] + 32%|███▏ | 2332/7378 [8:00:25<17:27:57, 12.46s/it] + +{'loss': 0.5209, 'learning_rate': 1.6007782602968288e-05, 'epoch': 0.32} + + 32%|███▏ | 2332/7378 [8:00:25<17:27:57, 12.46s/it] + 32%|███▏ | 2333/7378 [8:00:37<17:20:55, 12.38s/it] + +{'loss': 0.487, 'learning_rate': 1.6004272467768824e-05, 'epoch': 0.32} + + 32%|███▏ | 2333/7378 [8:00:37<17:20:55, 12.38s/it] + 32%|███▏ | 2334/7378 [8:00:49<17:20:25, 12.38s/it] + +{'loss': 0.4446, 'learning_rate': 1.6000761175339944e-05, 'epoch': 0.32} + + 32%|███▏ | 2334/7378 [8:00:49<17:20:25, 12.38s/it] + 32%|███▏ | 2335/7378 [8:01:01<17:11:08, 12.27s/it] + +{'loss': 0.4582, 'learning_rate': 1.599724872635839e-05, 'epoch': 0.32} + + 32%|███▏ | 2335/7378 [8:01:01<17:11:08, 12.27s/it] + 32%|███▏ | 2336/7378 [8:01:14<17:27:24, 12.46s/it] + +{'loss': 0.4927, 'learning_rate': 1.5993735121501128e-05, 'epoch': 0.32} + + 32%|███▏ | 2336/7378 [8:01:14<17:27:24, 12.46s/it] + 32%|███▏ | 2337/7378 [8:01:26<17:24:14, 12.43s/it] + +{'loss': 0.4339, 'learning_rate': 1.5990220361445353e-05, 'epoch': 0.32} + + 32%|███▏ | 2337/7378 [8:01:26<17:24:14, 12.43s/it] + 32%|███▏ | 2338/7378 [8:01:39<17:25:59, 12.45s/it] + +{'loss': 0.4605, 'learning_rate': 1.5986704446868482e-05, 'epoch': 0.32} + + 32%|███▏ | 2338/7378 [8:01:39<17:25:59, 12.45s/it] + 32%|███▏ | 2339/7378 [8:01:52<17:34:41, 12.56s/it] + +{'loss': 0.4257, 'learning_rate': 1.5983187378448152e-05, 'epoch': 0.32} + + 32%|███▏ | 2339/7378 [8:01:52<17:34:41, 12.56s/it] + 32%|███▏ | 2340/7378 [8:02:04<17:23:35, 12.43s/it] + +{'loss': 0.4475, 'learning_rate': 1.5979669156862222e-05, 'epoch': 0.32} + + 32%|███▏ | 2340/7378 [8:02:04<17:23:35, 12.43s/it] + 32%|███▏ | 2341/7378 [8:02:16<17:24:13, 12.44s/it] + +{'loss': 0.4921, 'learning_rate': 1.597614978278877e-05, 'epoch': 0.32} + + 32%|███▏ | 2341/7378 [8:02:16<17:24:13, 12.44s/it] + 32%|███▏ | 2342/7378 [8:02:29<17:22:27, 12.42s/it] + +{'loss': 0.4212, 'learning_rate': 1.5972629256906105e-05, 'epoch': 0.32} + + 32%|███▏ | 2342/7378 [8:02:29<17:22:27, 12.42s/it] + 32%|███▏ | 2343/7378 [8:02:41<17:17:19, 12.36s/it] + +{'loss': 0.5068, 'learning_rate': 1.5969107579892754e-05, 'epoch': 0.32} + + 32%|███▏ | 2343/7378 [8:02:41<17:17:19, 12.36s/it] + 32%|███▏ | 2344/7378 [8:02:54<17:30:44, 12.52s/it] + +{'loss': 0.461, 'learning_rate': 1.5965584752427463e-05, 'epoch': 0.32} + + 32%|███▏ | 2344/7378 [8:02:54<17:30:44, 12.52s/it] + 32%|███▏ | 2345/7378 [8:03:06<17:24:14, 12.45s/it] + +{'loss': 0.4927, 'learning_rate': 1.59620607751892e-05, 'epoch': 0.32} + + 32%|███▏ | 2345/7378 [8:03:06<17:24:14, 12.45s/it] + 32%|███▏ | 2346/7378 [8:03:18<17:18:20, 12.38s/it] + +{'loss': 0.418, 'learning_rate': 1.5958535648857157e-05, 'epoch': 0.32} + + 32%|███▏ | 2346/7378 [8:03:18<17:18:20, 12.38s/it] + 32%|███▏ | 2347/7378 [8:03:31<17:32:03, 12.55s/it] + +{'loss': 0.487, 'learning_rate': 1.595500937411075e-05, 'epoch': 0.32} + + 32%|███▏ | 2347/7378 [8:03:31<17:32:03, 12.55s/it] + 32%|███▏ | 2348/7378 [8:03:43<17:21:13, 12.42s/it] + +{'loss': 0.4692, 'learning_rate': 1.595148195162961e-05, 'epoch': 0.32} + + 32%|███▏ | 2348/7378 [8:03:43<17:21:13, 12.42s/it] + 32%|███▏ | 2349/7378 [8:03:56<17:33:01, 12.56s/it] + +{'loss': 0.4959, 'learning_rate': 1.5947953382093593e-05, 'epoch': 0.32} + + 32%|███▏ | 2349/7378 [8:03:56<17:33:01, 12.56s/it] + 32%|███▏ | 2350/7378 [8:04:09<17:33:49, 12.58s/it] + +{'loss': 0.4996, 'learning_rate': 1.5944423666182776e-05, 'epoch': 0.32} + + 32%|███▏ | 2350/7378 [8:04:09<17:33:49, 12.58s/it] + 32%|███▏ | 2351/7378 [8:04:21<17:21:34, 12.43s/it] + +{'loss': 0.4603, 'learning_rate': 1.594089280457746e-05, 'epoch': 0.32} + + 32%|███▏ | 2351/7378 [8:04:21<17:21:34, 12.43s/it] + 32%|███▏ | 2352/7378 [8:04:33<17:18:49, 12.40s/it] + +{'loss': 0.4728, 'learning_rate': 1.5937360797958157e-05, 'epoch': 0.32} + + 32%|███▏ | 2352/7378 [8:04:33<17:18:49, 12.40s/it] + 32%|███▏ | 2353/7378 [8:04:46<17:16:05, 12.37s/it] + +{'loss': 0.4753, 'learning_rate': 1.593382764700561e-05, 'epoch': 0.32} + + 32%|███▏ | 2353/7378 [8:04:46<17:16:05, 12.37s/it] + 32%|███▏ | 2354/7378 [8:04:58<17:12:58, 12.34s/it] + +{'loss': 0.5054, 'learning_rate': 1.5930293352400776e-05, 'epoch': 0.32} + + 32%|███▏ | 2354/7378 [8:04:58<17:12:58, 12.34s/it] + 32%|███▏ | 2355/7378 [8:05:10<17:07:43, 12.28s/it] + +{'loss': 0.4895, 'learning_rate': 1.5926757914824837e-05, 'epoch': 0.32} + + 32%|███▏ | 2355/7378 [8:05:10<17:07:43, 12.28s/it] + 32%|███▏ | 2356/7378 [8:05:22<17:05:17, 12.25s/it] + +{'loss': 0.4465, 'learning_rate': 1.592322133495919e-05, 'epoch': 0.32} + + 32%|███▏ | 2356/7378 [8:05:22<17:05:17, 12.25s/it] + 32%|███▏ | 2357/7378 [8:05:34<16:58:03, 12.17s/it] + +{'loss': 0.5035, 'learning_rate': 1.5919683613485458e-05, 'epoch': 0.32} + + 32%|███▏ | 2357/7378 [8:05:34<16:58:03, 12.17s/it] + 32%|███▏ | 2358/7378 [8:05:46<16:58:26, 12.17s/it] + +{'loss': 0.4579, 'learning_rate': 1.5916144751085485e-05, 'epoch': 0.32} + + 32%|███▏ | 2358/7378 [8:05:46<16:58:26, 12.17s/it] + 32%|███▏ | 2359/7378 [8:05:59<17:00:04, 12.19s/it] + +{'loss': 0.3954, 'learning_rate': 1.5912604748441323e-05, 'epoch': 0.32} + + 32%|███▏ | 2359/7378 [8:05:59<17:00:04, 12.19s/it] + 32%|███▏ | 2360/7378 [8:06:11<17:05:11, 12.26s/it] + +{'loss': 0.4466, 'learning_rate': 1.590906360623526e-05, 'epoch': 0.32} + + 32%|███▏ | 2360/7378 [8:06:11<17:05:11, 12.26s/it] + 32%|███▏ | 2361/7378 [8:06:23<16:55:57, 12.15s/it] + +{'loss': 0.5263, 'learning_rate': 1.5905521325149788e-05, 'epoch': 0.32} + + 32%|███▏ | 2361/7378 [8:06:23<16:55:57, 12.15s/it] + 32%|███▏ | 2362/7378 [8:06:35<16:57:25, 12.17s/it] + +{'loss': 0.482, 'learning_rate': 1.5901977905867634e-05, 'epoch': 0.32} + + 32%|███▏ | 2362/7378 [8:06:35<16:57:25, 12.17s/it] + 32%|███▏ | 2363/7378 [8:06:47<16:51:39, 12.10s/it] + +{'loss': 0.4997, 'learning_rate': 1.5898433349071727e-05, 'epoch': 0.32} + + 32%|███▏ | 2363/7378 [8:06:47<16:51:39, 12.10s/it] + 32%|███▏ | 2364/7378 [8:06:59<16:58:56, 12.19s/it] + +{'loss': 0.3972, 'learning_rate': 1.589488765544524e-05, 'epoch': 0.32} + + 32%|███▏ | 2364/7378 [8:06:59<16:58:56, 12.19s/it] + 32%|███▏ | 2365/7378 [8:07:12<17:07:00, 12.29s/it] + +{'loss': 0.5309, 'learning_rate': 1.5891340825671533e-05, 'epoch': 0.32} + + 32%|███▏ | 2365/7378 [8:07:12<17:07:00, 12.29s/it] + 32%|███▏ | 2366/7378 [8:07:24<16:59:44, 12.21s/it] + +{'loss': 0.5128, 'learning_rate': 1.5887792860434207e-05, 'epoch': 0.32} + + 32%|███▏ | 2366/7378 [8:07:24<16:59:44, 12.21s/it] + 32%|███▏ | 2367/7378 [8:07:38<17:44:14, 12.74s/it] + +{'loss': 0.5302, 'learning_rate': 1.5884243760417083e-05, 'epoch': 0.32} + + 32%|███▏ | 2367/7378 [8:07:38<17:44:14, 12.74s/it] + 32%|███▏ | 2368/7378 [8:07:50<17:34:10, 12.62s/it] + +{'loss': 0.5177, 'learning_rate': 1.5880693526304192e-05, 'epoch': 0.32} + + 32%|███▏ | 2368/7378 [8:07:50<17:34:10, 12.62s/it] + 32%|███▏ | 2369/7378 [8:08:03<17:30:16, 12.58s/it] + +{'loss': 0.4836, 'learning_rate': 1.587714215877978e-05, 'epoch': 0.32} + + 32%|███▏ | 2369/7378 [8:08:03<17:30:16, 12.58s/it] + 32%|███▏ | 2370/7378 [8:08:15<17:18:31, 12.44s/it] + +{'loss': 0.4436, 'learning_rate': 1.5873589658528326e-05, 'epoch': 0.32} + + 32%|███▏ | 2370/7378 [8:08:15<17:18:31, 12.44s/it] + 32%|███▏ | 2371/7378 [8:08:27<17:19:46, 12.46s/it] + +{'loss': 0.453, 'learning_rate': 1.587003602623451e-05, 'epoch': 0.32} + + 32%|███▏ | 2371/7378 [8:08:27<17:19:46, 12.46s/it] + 32%|███▏ | 2372/7378 [8:08:40<17:14:26, 12.40s/it] + +{'loss': 0.5418, 'learning_rate': 1.5866481262583245e-05, 'epoch': 0.32} + + 32%|███▏ | 2372/7378 [8:08:40<17:14:26, 12.40s/it] + 32%|███▏ | 2373/7378 [8:08:52<17:06:37, 12.31s/it] + +{'loss': 0.4679, 'learning_rate': 1.5862925368259654e-05, 'epoch': 0.32} + + 32%|███▏ | 2373/7378 [8:08:52<17:06:37, 12.31s/it] + 32%|███▏ | 2374/7378 [8:09:04<17:15:51, 12.42s/it] + +{'loss': 0.5251, 'learning_rate': 1.5859368343949084e-05, 'epoch': 0.32} + + 32%|███▏ | 2374/7378 [8:09:04<17:15:51, 12.42s/it] + 32%|███▏ | 2375/7378 [8:09:16<17:00:00, 12.23s/it] + +{'loss': 0.4784, 'learning_rate': 1.5855810190337088e-05, 'epoch': 0.32} + + 32%|███▏ | 2375/7378 [8:09:16<17:00:00, 12.23s/it] + 32%|███▏ | 2376/7378 [8:09:28<16:55:49, 12.18s/it] + +{'loss': 0.5398, 'learning_rate': 1.5852250908109448e-05, 'epoch': 0.32} + + 32%|███▏ | 2376/7378 [8:09:28<16:55:49, 12.18s/it] + 32%|███▏ | 2377/7378 [8:09:40<16:52:12, 12.14s/it] + +{'loss': 0.464, 'learning_rate': 1.5848690497952163e-05, 'epoch': 0.32} + + 32%|███▏ | 2377/7378 [8:09:40<16:52:12, 12.14s/it] + 32%|███▏ | 2378/7378 [8:09:52<16:51:53, 12.14s/it] + +{'loss': 0.481, 'learning_rate': 1.584512896055144e-05, 'epoch': 0.32} + + 32%|███▏ | 2378/7378 [8:09:52<16:51:53, 12.14s/it] + 32%|███▏ | 2379/7378 [8:10:05<16:57:06, 12.21s/it] + +{'loss': 0.4848, 'learning_rate': 1.5841566296593714e-05, 'epoch': 0.32} + + 32%|███▏ | 2379/7378 [8:10:05<16:57:06, 12.21s/it] + 32%|███▏ | 2380/7378 [8:10:17<16:48:00, 12.10s/it] + +{'loss': 0.4576, 'learning_rate': 1.583800250676563e-05, 'epoch': 0.32} + + 32%|███▏ | 2380/7378 [8:10:17<16:48:00, 12.10s/it] + 32%|███▏ | 2381/7378 [8:10:29<16:51:50, 12.15s/it] + +{'loss': 0.4728, 'learning_rate': 1.5834437591754063e-05, 'epoch': 0.32} + + 32%|███▏ | 2381/7378 [8:10:29<16:51:50, 12.15s/it] + 32%|███▏ | 2382/7378 [8:10:42<17:10:58, 12.38s/it] + +{'loss': 0.5103, 'learning_rate': 1.5830871552246076e-05, 'epoch': 0.32} + + 32%|███▏ | 2382/7378 [8:10:42<17:10:58, 12.38s/it] + 32%|███▏ | 2383/7378 [8:10:55<17:17:32, 12.46s/it] + +{'loss': 0.5038, 'learning_rate': 1.582730438892898e-05, 'epoch': 0.32} + + 32%|███▏ | 2383/7378 [8:10:55<17:17:32, 12.46s/it] + 32%|███▏ | 2384/7378 [8:11:07<17:21:10, 12.51s/it] + +{'loss': 0.4805, 'learning_rate': 1.582373610249029e-05, 'epoch': 0.32} + + 32%|███▏ | 2384/7378 [8:11:07<17:21:10, 12.51s/it] + 32%|███▏ | 2385/7378 [8:11:19<17:17:02, 12.46s/it] + +{'loss': 0.4583, 'learning_rate': 1.582016669361773e-05, 'epoch': 0.32} + + 32%|███▏ | 2385/7378 [8:11:19<17:17:02, 12.46s/it] + 32%|███▏ | 2386/7378 [8:11:32<17:11:31, 12.40s/it] + +{'loss': 0.3871, 'learning_rate': 1.5816596162999252e-05, 'epoch': 0.32} + + 32%|███▏ | 2386/7378 [8:11:32<17:11:31, 12.40s/it] + 32%|███▏ | 2387/7378 [8:11:44<17:08:38, 12.37s/it] + +{'loss': 0.4886, 'learning_rate': 1.5813024511323017e-05, 'epoch': 0.32} + + 32%|███▏ | 2387/7378 [8:11:44<17:08:38, 12.37s/it] + 32%|███▏ | 2388/7378 [8:11:56<16:56:34, 12.22s/it] + +{'loss': 0.4497, 'learning_rate': 1.580945173927741e-05, 'epoch': 0.32} + + 32%|███▏ | 2388/7378 [8:11:56<16:56:34, 12.22s/it] + 32%|███▏ | 2389/7378 [8:12:08<16:59:01, 12.26s/it] + +{'loss': 0.4286, 'learning_rate': 1.5805877847551027e-05, 'epoch': 0.32} + + 32%|███▏ | 2389/7378 [8:12:08<16:59:01, 12.26s/it] + 32%|███▏ | 2390/7378 [8:12:20<16:53:36, 12.19s/it] + +{'loss': 0.4874, 'learning_rate': 1.5802302836832673e-05, 'epoch': 0.32} + + 32%|███▏ | 2390/7378 [8:12:20<16:53:36, 12.19s/it] + 32%|███▏ | 2391/7378 [8:12:32<16:50:52, 12.16s/it] + +{'loss': 0.4946, 'learning_rate': 1.5798726707811383e-05, 'epoch': 0.32} + + 32%|███▏ | 2391/7378 [8:12:32<16:50:52, 12.16s/it] + 32%|███▏ | 2392/7378 [8:12:45<16:53:28, 12.20s/it] + +{'loss': 0.4431, 'learning_rate': 1.5795149461176393e-05, 'epoch': 0.32} + + 32%|███▏ | 2392/7378 [8:12:45<16:53:28, 12.20s/it] + 32%|███▏ | 2393/7378 [8:12:57<16:52:17, 12.18s/it] + +{'loss': 0.438, 'learning_rate': 1.5791571097617163e-05, 'epoch': 0.32} + + 32%|███▏ | 2393/7378 [8:12:57<16:52:17, 12.18s/it] + 32%|███▏ | 2394/7378 [8:13:09<17:00:14, 12.28s/it] + +{'loss': 0.4374, 'learning_rate': 1.578799161782337e-05, 'epoch': 0.32} + + 32%|███▏ | 2394/7378 [8:13:09<17:00:14, 12.28s/it] + 32%|███▏ | 2395/7378 [8:13:21<16:47:38, 12.13s/it] + +{'loss': 0.5061, 'learning_rate': 1.57844110224849e-05, 'epoch': 0.32} + + 32%|███▏ | 2395/7378 [8:13:21<16:47:38, 12.13s/it] + 32%|███▏ | 2396/7378 [8:13:33<16:45:40, 12.11s/it] + +{'loss': 0.443, 'learning_rate': 1.5780829312291858e-05, 'epoch': 0.32} + + 32%|███▏ | 2396/7378 [8:13:33<16:45:40, 12.11s/it] + 32%|███▏ | 2397/7378 [8:13:45<16:45:38, 12.11s/it] + +{'loss': 0.5071, 'learning_rate': 1.577724648793456e-05, 'epoch': 0.32} + + 32%|███▏ | 2397/7378 [8:13:45<16:45:38, 12.11s/it] + 33%|███▎ | 2398/7378 [8:13:58<16:50:58, 12.18s/it] + +{'loss': 0.4405, 'learning_rate': 1.577366255010354e-05, 'epoch': 0.33} + + 33%|███▎ | 2398/7378 [8:13:58<16:50:58, 12.18s/it] + 33%|███▎ | 2399/7378 [8:14:10<16:44:18, 12.10s/it] + +{'loss': 0.5088, 'learning_rate': 1.5770077499489546e-05, 'epoch': 0.33} + + 33%|███▎ | 2399/7378 [8:14:10<16:44:18, 12.10s/it] + 33%|███▎ | 2400/7378 [8:14:21<16:38:57, 12.04s/it] + +{'loss': 0.5298, 'learning_rate': 1.5766491336783543e-05, 'epoch': 0.33} + + 33%|███▎ | 2400/7378 [8:14:21<16:38:57, 12.04s/it] + 33%|███▎ | 2401/7378 [8:14:34<16:46:14, 12.13s/it] + +{'loss': 0.5149, 'learning_rate': 1.5762904062676706e-05, 'epoch': 0.33} + + 33%|███▎ | 2401/7378 [8:14:34<16:46:14, 12.13s/it] + 33%|███▎ | 2402/7378 [8:14:46<16:50:54, 12.19s/it] + +{'loss': 0.5098, 'learning_rate': 1.575931567786042e-05, 'epoch': 0.33} + + 33%|███▎ | 2402/7378 [8:14:46<16:50:54, 12.19s/it] + 33%|███▎ | 2403/7378 [8:14:58<16:50:45, 12.19s/it] + +{'loss': 0.4389, 'learning_rate': 1.5755726183026303e-05, 'epoch': 0.33} + + 33%|███▎ | 2403/7378 [8:14:58<16:50:45, 12.19s/it] + 33%|███▎ | 2404/7378 [8:15:11<17:03:43, 12.35s/it] + +{'loss': 0.5647, 'learning_rate': 1.5752135578866162e-05, 'epoch': 0.33} + + 33%|███▎ | 2404/7378 [8:15:11<17:03:43, 12.35s/it] + 33%|███▎ | 2405/7378 [8:15:23<17:03:52, 12.35s/it] + +{'loss': 0.5578, 'learning_rate': 1.5748543866072033e-05, 'epoch': 0.33} + + 33%|███▎ | 2405/7378 [8:15:23<17:03:52, 12.35s/it] + 33%|███▎ | 2406/7378 [8:15:36<16:58:06, 12.29s/it] + +{'loss': 0.5204, 'learning_rate': 1.5744951045336166e-05, 'epoch': 0.33} + + 33%|███▎ | 2406/7378 [8:15:36<16:58:06, 12.29s/it] + 33%|███▎ | 2407/7378 [8:15:48<16:57:23, 12.28s/it] + +{'loss': 0.5252, 'learning_rate': 1.5741357117351018e-05, 'epoch': 0.33} + + 33%|███▎ | 2407/7378 [8:15:48<16:57:23, 12.28s/it] + 33%|███▎ | 2408/7378 [8:16:00<16:49:15, 12.18s/it] + +{'loss': 0.5172, 'learning_rate': 1.573776208280926e-05, 'epoch': 0.33} + + 33%|███▎ | 2408/7378 [8:16:00<16:49:15, 12.18s/it] + 33%|███▎ | 2409/7378 [8:16:13<17:08:34, 12.42s/it] + +{'loss': 0.4675, 'learning_rate': 1.5734165942403782e-05, 'epoch': 0.33} + + 33%|███▎ | 2409/7378 [8:16:13<17:08:34, 12.42s/it] + 33%|███▎ | 2410/7378 [8:16:25<17:10:32, 12.45s/it] + +{'loss': 0.4503, 'learning_rate': 1.5730568696827684e-05, 'epoch': 0.33} + + 33%|███▎ | 2410/7378 [8:16:25<17:10:32, 12.45s/it] + 33%|███▎ | 2411/7378 [8:16:38<17:08:13, 12.42s/it] + +{'loss': 0.5144, 'learning_rate': 1.572697034677428e-05, 'epoch': 0.33} + + 33%|███▎ | 2411/7378 [8:16:38<17:08:13, 12.42s/it] + 33%|███▎ | 2412/7378 [8:16:50<16:57:49, 12.30s/it] + +{'loss': 0.4382, 'learning_rate': 1.5723370892937085e-05, 'epoch': 0.33} + + 33%|███▎ | 2412/7378 [8:16:50<16:57:49, 12.30s/it] + 33%|███▎ | 2413/7378 [8:17:02<16:58:41, 12.31s/it] + +{'loss': 0.5319, 'learning_rate': 1.571977033600985e-05, 'epoch': 0.33} + + 33%|███▎ | 2413/7378 [8:17:02<16:58:41, 12.31s/it] + 33%|███▎ | 2414/7378 [8:17:14<16:59:34, 12.32s/it] + +{'loss': 0.5026, 'learning_rate': 1.5716168676686523e-05, 'epoch': 0.33} + + 33%|███▎ | 2414/7378 [8:17:14<16:59:34, 12.32s/it] + 33%|███▎ | 2415/7378 [8:17:27<17:06:15, 12.41s/it] + +{'loss': 0.465, 'learning_rate': 1.5712565915661264e-05, 'epoch': 0.33} + + 33%|███▎ | 2415/7378 [8:17:27<17:06:15, 12.41s/it] + 33%|███▎ | 2416/7378 [8:17:39<16:54:56, 12.27s/it] + +{'loss': 0.5126, 'learning_rate': 1.570896205362845e-05, 'epoch': 0.33} + + 33%|███▎ | 2416/7378 [8:17:39<16:54:56, 12.27s/it] + 33%|███▎ | 2417/7378 [8:17:51<16:52:11, 12.24s/it] + +{'loss': 0.3874, 'learning_rate': 1.570535709128267e-05, 'epoch': 0.33} + + 33%|███▎ | 2417/7378 [8:17:51<16:52:11, 12.24s/it] + 33%|███▎ | 2418/7378 [8:18:03<16:55:33, 12.28s/it] + +{'loss': 0.538, 'learning_rate': 1.5701751029318723e-05, 'epoch': 0.33} + + 33%|███▎ | 2418/7378 [8:18:03<16:55:33, 12.28s/it] + 33%|███▎ | 2419/7378 [8:18:15<16:44:43, 12.16s/it] + +{'loss': 0.4883, 'learning_rate': 1.5698143868431617e-05, 'epoch': 0.33} + + 33%|███▎ | 2419/7378 [8:18:15<16:44:43, 12.16s/it] + 33%|███▎ | 2420/7378 [8:18:28<16:58:25, 12.32s/it] + +{'loss': 0.5142, 'learning_rate': 1.5694535609316585e-05, 'epoch': 0.33} + + 33%|███▎ | 2420/7378 [8:18:28<16:58:25, 12.32s/it] + 33%|███▎ | 2421/7378 [8:18:40<16:58:56, 12.33s/it] + +{'loss': 0.4389, 'learning_rate': 1.5690926252669058e-05, 'epoch': 0.33} + + 33%|███▎ | 2421/7378 [8:18:40<16:58:56, 12.33s/it] + 33%|███▎ | 2422/7378 [8:18:53<17:01:26, 12.37s/it] + +{'loss': 0.4765, 'learning_rate': 1.568731579918468e-05, 'epoch': 0.33} + + 33%|███▎ | 2422/7378 [8:18:53<17:01:26, 12.37s/it] + 33%|███▎ | 2423/7378 [8:19:05<17:00:25, 12.36s/it] + +{'loss': 0.4486, 'learning_rate': 1.568370424955931e-05, 'epoch': 0.33} + + 33%|███▎ | 2423/7378 [8:19:05<17:00:25, 12.36s/it] + 33%|███▎ | 2424/7378 [8:19:17<17:00:04, 12.35s/it] + +{'loss': 0.5427, 'learning_rate': 1.568009160448902e-05, 'epoch': 0.33} + + 33%|███▎ | 2424/7378 [8:19:17<17:00:04, 12.35s/it] + 33%|███▎ | 2425/7378 [8:19:30<17:02:21, 12.38s/it] + +{'loss': 0.5218, 'learning_rate': 1.5676477864670093e-05, 'epoch': 0.33} + + 33%|███▎ | 2425/7378 [8:19:30<17:02:21, 12.38s/it] + 33%|███▎ | 2426/7378 [8:19:42<17:01:45, 12.38s/it] + +{'loss': 0.4311, 'learning_rate': 1.5672863030799015e-05, 'epoch': 0.33} + + 33%|███▎ | 2426/7378 [8:19:42<17:01:45, 12.38s/it] + 33%|███▎ | 2427/7378 [8:19:54<16:53:37, 12.28s/it] + +{'loss': 0.5264, 'learning_rate': 1.5669247103572493e-05, 'epoch': 0.33} + + 33%|███▎ | 2427/7378 [8:19:54<16:53:37, 12.28s/it] + 33%|███▎ | 2428/7378 [8:20:06<16:47:55, 12.22s/it] + +{'loss': 0.5005, 'learning_rate': 1.5665630083687438e-05, 'epoch': 0.33} + + 33%|███▎ | 2428/7378 [8:20:06<16:47:55, 12.22s/it] + 33%|███▎ | 2429/7378 [8:20:19<16:59:22, 12.36s/it] + +{'loss': 0.4921, 'learning_rate': 1.5662011971840972e-05, 'epoch': 0.33} + + 33%|███▎ | 2429/7378 [8:20:19<16:59:22, 12.36s/it] + 33%|███▎ | 2430/7378 [8:20:31<16:58:19, 12.35s/it] + +{'loss': 0.4133, 'learning_rate': 1.5658392768730434e-05, 'epoch': 0.33} + + 33%|███▎ | 2430/7378 [8:20:31<16:58:19, 12.35s/it] + 33%|███▎ | 2431/7378 [8:20:44<17:06:34, 12.45s/it] + +{'loss': 0.4581, 'learning_rate': 1.5654772475053365e-05, 'epoch': 0.33} + + 33%|███▎ | 2431/7378 [8:20:44<17:06:34, 12.45s/it] + 33%|███▎ | 2432/7378 [8:20:56<16:57:28, 12.34s/it] + +{'loss': 0.4768, 'learning_rate': 1.565115109150752e-05, 'epoch': 0.33} + + 33%|███▎ | 2432/7378 [8:20:56<16:57:28, 12.34s/it] + 33%|███▎ | 2433/7378 [8:21:09<16:58:33, 12.36s/it] + +{'loss': 0.436, 'learning_rate': 1.5647528618790872e-05, 'epoch': 0.33} + + 33%|███▎ | 2433/7378 [8:21:09<16:58:33, 12.36s/it] + 33%|███▎ | 2434/7378 [8:21:21<16:55:33, 12.32s/it] + +{'loss': 0.4941, 'learning_rate': 1.5643905057601583e-05, 'epoch': 0.33} + + 33%|███▎ | 2434/7378 [8:21:21<16:55:33, 12.32s/it] + 33%|███▎ | 2435/7378 [8:21:33<16:46:41, 12.22s/it] + +{'loss': 0.4925, 'learning_rate': 1.5640280408638044e-05, 'epoch': 0.33} + + 33%|███▎ | 2435/7378 [8:21:33<16:46:41, 12.22s/it] + 33%|███▎ | 2436/7378 [8:21:45<16:50:36, 12.27s/it] + +{'loss': 0.5256, 'learning_rate': 1.563665467259885e-05, 'epoch': 0.33} + + 33%|███▎ | 2436/7378 [8:21:45<16:50:36, 12.27s/it] + 33%|███▎ | 2437/7378 [8:21:57<16:46:11, 12.22s/it] + +{'loss': 0.5113, 'learning_rate': 1.5633027850182803e-05, 'epoch': 0.33} + + 33%|███▎ | 2437/7378 [8:21:57<16:46:11, 12.22s/it] + 33%|███▎ | 2438/7378 [8:22:10<16:46:43, 12.23s/it] + +{'loss': 0.5157, 'learning_rate': 1.562939994208892e-05, 'epoch': 0.33} + + 33%|███▎ | 2438/7378 [8:22:10<16:46:43, 12.23s/it] + 33%|███▎ | 2439/7378 [8:22:22<16:42:20, 12.18s/it] + +{'loss': 0.4455, 'learning_rate': 1.562577094901642e-05, 'epoch': 0.33} + + 33%|███▎ | 2439/7378 [8:22:22<16:42:20, 12.18s/it] + 33%|███▎ | 2440/7378 [8:22:33<16:33:32, 12.07s/it] + +{'loss': 0.4666, 'learning_rate': 1.5622140871664733e-05, 'epoch': 0.33} + + 33%|███▎ | 2440/7378 [8:22:33<16:33:32, 12.07s/it] + 33%|███▎ | 2441/7378 [8:22:46<16:34:58, 12.09s/it] + +{'loss': 0.4323, 'learning_rate': 1.5618509710733502e-05, 'epoch': 0.33} + + 33%|███▎ | 2441/7378 [8:22:46<16:34:58, 12.09s/it] + 33%|███▎ | 2442/7378 [8:22:58<16:31:19, 12.05s/it] + +{'loss': 0.4644, 'learning_rate': 1.5614877466922574e-05, 'epoch': 0.33} + + 33%|███▎ | 2442/7378 [8:22:58<16:31:19, 12.05s/it] + 33%|███▎ | 2443/7378 [8:23:10<16:35:19, 12.10s/it] + +{'loss': 0.484, 'learning_rate': 1.5611244140932013e-05, 'epoch': 0.33} + + 33%|███▎ | 2443/7378 [8:23:10<16:35:19, 12.10s/it] + 33%|███▎ | 2444/7378 [8:23:22<16:46:32, 12.24s/it] + +{'loss': 0.434, 'learning_rate': 1.5607609733462076e-05, 'epoch': 0.33} + + 33%|███▎ | 2444/7378 [8:23:22<16:46:32, 12.24s/it] + 33%|███▎ | 2445/7378 [8:23:35<16:48:13, 12.26s/it] + +{'loss': 0.5217, 'learning_rate': 1.5603974245213247e-05, 'epoch': 0.33} + + 33%|███▎ | 2445/7378 [8:23:35<16:48:13, 12.26s/it] + 33%|███▎ | 2446/7378 [8:23:47<16:46:40, 12.25s/it] + +{'loss': 0.4753, 'learning_rate': 1.5600337676886205e-05, 'epoch': 0.33} + + 33%|███▎ | 2446/7378 [8:23:47<16:46:40, 12.25s/it] + 33%|███▎ | 2447/7378 [8:23:59<16:36:48, 12.13s/it] + +{'loss': 0.4672, 'learning_rate': 1.5596700029181843e-05, 'epoch': 0.33} + + 33%|███▎ | 2447/7378 [8:23:59<16:36:48, 12.13s/it] + 33%|███▎ | 2448/7378 [8:24:11<16:34:50, 12.11s/it] + +{'loss': 0.5108, 'learning_rate': 1.5593061302801263e-05, 'epoch': 0.33} + + 33%|███▎ | 2448/7378 [8:24:11<16:34:50, 12.11s/it] + 33%|███▎ | 2449/7378 [8:24:23<16:37:01, 12.14s/it] + +{'loss': 0.4821, 'learning_rate': 1.5589421498445765e-05, 'epoch': 0.33} + + 33%|███▎ | 2449/7378 [8:24:23<16:37:01, 12.14s/it] + 33%|███▎ | 2450/7378 [8:24:35<16:36:44, 12.14s/it] + +{'loss': 0.4942, 'learning_rate': 1.558578061681687e-05, 'epoch': 0.33} + + 33%|███▎ | 2450/7378 [8:24:35<16:36:44, 12.14s/it] + 33%|███▎ | 2451/7378 [8:24:47<16:42:30, 12.21s/it] + +{'loss': 0.5054, 'learning_rate': 1.55821386586163e-05, 'epoch': 0.33} + + 33%|███▎ | 2451/7378 [8:24:47<16:42:30, 12.21s/it] + 33%|███▎ | 2452/7378 [8:24:59<16:37:51, 12.15s/it] + +{'loss': 0.4709, 'learning_rate': 1.5578495624545988e-05, 'epoch': 0.33} + + 33%|███▎ | 2452/7378 [8:24:59<16:37:51, 12.15s/it] + 33%|███▎ | 2453/7378 [8:25:12<16:39:18, 12.17s/it] + +{'loss': 0.4251, 'learning_rate': 1.5574851515308063e-05, 'epoch': 0.33} + + 33%|███▎ | 2453/7378 [8:25:12<16:39:18, 12.17s/it] + 33%|███▎ | 2454/7378 [8:25:24<16:39:14, 12.18s/it] + +{'loss': 0.5593, 'learning_rate': 1.5571206331604885e-05, 'epoch': 0.33} + + 33%|███▎ | 2454/7378 [8:25:24<16:39:14, 12.18s/it] + 33%|███▎ | 2455/7378 [8:25:36<16:43:14, 12.23s/it] + +{'loss': 0.463, 'learning_rate': 1.556756007413899e-05, 'epoch': 0.33} + + 33%|███▎ | 2455/7378 [8:25:36<16:43:14, 12.23s/it] + 33%|███▎ | 2456/7378 [8:25:49<16:52:19, 12.34s/it] + +{'loss': 0.4715, 'learning_rate': 1.5563912743613143e-05, 'epoch': 0.33} + + 33%|███▎ | 2456/7378 [8:25:49<16:52:19, 12.34s/it] + 33%|███▎ | 2457/7378 [8:26:01<16:46:15, 12.27s/it] + +{'loss': 0.4485, 'learning_rate': 1.5560264340730315e-05, 'epoch': 0.33} + + 33%|███▎ | 2457/7378 [8:26:01<16:46:15, 12.27s/it] + 33%|███▎ | 2458/7378 [8:26:13<16:47:58, 12.29s/it] + +{'loss': 0.4522, 'learning_rate': 1.555661486619367e-05, 'epoch': 0.33} + + 33%|███▎ | 2458/7378 [8:26:13<16:47:58, 12.29s/it] + 33%|███▎ | 2459/7378 [8:26:25<16:45:05, 12.26s/it] + +{'loss': 0.4443, 'learning_rate': 1.5552964320706593e-05, 'epoch': 0.33} + + 33%|███▎ | 2459/7378 [8:26:25<16:45:05, 12.26s/it] + 33%|███▎ | 2460/7378 [8:26:37<16:37:35, 12.17s/it] + +{'loss': 0.4285, 'learning_rate': 1.5549312704972667e-05, 'epoch': 0.33} + + 33%|███▎ | 2460/7378 [8:26:37<16:37:35, 12.17s/it] + 33%|███▎ | 2461/7378 [8:26:50<16:47:01, 12.29s/it] + +{'loss': 0.5019, 'learning_rate': 1.5545660019695684e-05, 'epoch': 0.33} + + 33%|███▎ | 2461/7378 [8:26:50<16:47:01, 12.29s/it] + 33%|███▎ | 2462/7378 [8:27:03<16:52:27, 12.36s/it] + +{'loss': 0.4948, 'learning_rate': 1.5542006265579643e-05, 'epoch': 0.33} + + 33%|███▎ | 2462/7378 [8:27:03<16:52:27, 12.36s/it] + 33%|███▎ | 2463/7378 [8:27:15<16:50:22, 12.33s/it] + +{'loss': 0.4926, 'learning_rate': 1.5538351443328747e-05, 'epoch': 0.33} + + 33%|███▎ | 2463/7378 [8:27:15<16:50:22, 12.33s/it] + 33%|███▎ | 2464/7378 [8:27:30<17:54:55, 13.12s/it] + +{'loss': 0.4352, 'learning_rate': 1.5534695553647404e-05, 'epoch': 0.33} + + 33%|███▎ | 2464/7378 [8:27:30<17:54:55, 13.12s/it] + 33%|███▎ | 2465/7378 [8:27:42<17:34:12, 12.87s/it] + +{'loss': 0.5171, 'learning_rate': 1.5531038597240232e-05, 'epoch': 0.33} + + 33%|███▎ | 2465/7378 [8:27:42<17:34:12, 12.87s/it] + 33%|███▎ | 2466/7378 [8:27:54<17:23:11, 12.74s/it] + +{'loss': 0.4589, 'learning_rate': 1.5527380574812054e-05, 'epoch': 0.33} + + 33%|███▎ | 2466/7378 [8:27:54<17:23:11, 12.74s/it] + 33%|███▎ | 2467/7378 [8:28:07<17:21:03, 12.72s/it] + +{'loss': 0.5229, 'learning_rate': 1.5523721487067895e-05, 'epoch': 0.33} + + 33%|███▎ | 2467/7378 [8:28:07<17:21:03, 12.72s/it] + 33%|███▎ | 2468/7378 [8:28:20<17:17:52, 12.68s/it] + +{'loss': 0.462, 'learning_rate': 1.5520061334712978e-05, 'epoch': 0.33} + + 33%|███▎ | 2468/7378 [8:28:20<17:17:52, 12.68s/it] + 33%|███▎ | 2469/7378 [8:28:32<17:11:42, 12.61s/it] + +{'loss': 0.5257, 'learning_rate': 1.551640011845275e-05, 'epoch': 0.33} + + 33%|███▎ | 2469/7378 [8:28:32<17:11:42, 12.61s/it] + 33%|███▎ | 2470/7378 [8:28:44<16:53:33, 12.39s/it] + +{'loss': 0.4521, 'learning_rate': 1.5512737838992852e-05, 'epoch': 0.33} + + 33%|███▎ | 2470/7378 [8:28:44<16:53:33, 12.39s/it] + 33%|███▎ | 2471/7378 [8:28:56<16:38:10, 12.21s/it] + +{'loss': 0.4413, 'learning_rate': 1.550907449703913e-05, 'epoch': 0.33} + + 33%|███▎ | 2471/7378 [8:28:56<16:38:10, 12.21s/it] + 34%|███▎ | 2472/7378 [8:29:08<16:38:02, 12.21s/it] + +{'loss': 0.4875, 'learning_rate': 1.5505410093297633e-05, 'epoch': 0.34} + + 34%|███▎ | 2472/7378 [8:29:08<16:38:02, 12.21s/it] + 34%|███▎ | 2473/7378 [8:29:20<16:39:38, 12.23s/it] + +{'loss': 0.4555, 'learning_rate': 1.5501744628474623e-05, 'epoch': 0.34} + + 34%|███▎ | 2473/7378 [8:29:20<16:39:38, 12.23s/it] + 34%|███▎ | 2474/7378 [8:29:33<16:41:23, 12.25s/it] + +{'loss': 0.4861, 'learning_rate': 1.5498078103276555e-05, 'epoch': 0.34} + + 34%|███▎ | 2474/7378 [8:29:33<16:41:23, 12.25s/it] + 34%|███▎ | 2475/7378 [8:29:45<16:44:04, 12.29s/it] + +{'loss': 0.4762, 'learning_rate': 1.5494410518410096e-05, 'epoch': 0.34} + + 34%|███▎ | 2475/7378 [8:29:45<16:44:04, 12.29s/it] + 34%|███▎ | 2476/7378 [8:29:57<16:47:08, 12.33s/it] + +{'loss': 0.529, 'learning_rate': 1.5490741874582117e-05, 'epoch': 0.34} + + 34%|███▎ | 2476/7378 [8:29:57<16:47:08, 12.33s/it] + 34%|███▎ | 2477/7378 [8:30:10<16:53:50, 12.41s/it] + +{'loss': 0.5261, 'learning_rate': 1.5487072172499696e-05, 'epoch': 0.34} + + 34%|███▎ | 2477/7378 [8:30:10<16:53:50, 12.41s/it] + 34%|███▎ | 2478/7378 [8:30:23<16:57:49, 12.46s/it] + +{'loss': 0.4817, 'learning_rate': 1.5483401412870097e-05, 'epoch': 0.34} + + 34%|███▎ | 2478/7378 [8:30:23<16:57:49, 12.46s/it] + 34%|███▎ | 2479/7378 [8:30:35<16:48:41, 12.35s/it] + +{'loss': 0.444, 'learning_rate': 1.5479729596400814e-05, 'epoch': 0.34} + + 34%|███▎ | 2479/7378 [8:30:35<16:48:41, 12.35s/it] + 34%|███▎ | 2480/7378 [8:30:47<16:51:38, 12.39s/it] + +{'loss': 0.453, 'learning_rate': 1.5476056723799532e-05, 'epoch': 0.34} + + 34%|███▎ | 2480/7378 [8:30:47<16:51:38, 12.39s/it] + 34%|███▎ | 2481/7378 [8:30:59<16:48:44, 12.36s/it] + +{'loss': 0.4799, 'learning_rate': 1.5472382795774127e-05, 'epoch': 0.34} + + 34%|███▎ | 2481/7378 [8:30:59<16:48:44, 12.36s/it] + 34%|███▎ | 2482/7378 [8:31:12<16:45:49, 12.33s/it] + +{'loss': 0.5177, 'learning_rate': 1.5468707813032705e-05, 'epoch': 0.34} + + 34%|███▎ | 2482/7378 [8:31:12<16:45:49, 12.33s/it] + 34%|███▎ | 2483/7378 [8:31:25<16:56:51, 12.46s/it] + +{'loss': 0.5074, 'learning_rate': 1.5465031776283555e-05, 'epoch': 0.34} + + 34%|███▎ | 2483/7378 [8:31:25<16:56:51, 12.46s/it] + 34%|███▎ | 2484/7378 [8:31:37<16:50:16, 12.39s/it] + +{'loss': 0.4513, 'learning_rate': 1.5461354686235175e-05, 'epoch': 0.34} + + 34%|███▎ | 2484/7378 [8:31:37<16:50:16, 12.39s/it] + 34%|███▎ | 2485/7378 [8:31:49<16:47:37, 12.36s/it] + +{'loss': 0.4795, 'learning_rate': 1.545767654359627e-05, 'epoch': 0.34} + + 34%|███▎ | 2485/7378 [8:31:49<16:47:37, 12.36s/it] + 34%|███▎ | 2486/7378 [8:32:01<16:48:55, 12.37s/it] + +{'loss': 0.4839, 'learning_rate': 1.5453997349075742e-05, 'epoch': 0.34} + + 34%|███▎ | 2486/7378 [8:32:01<16:48:55, 12.37s/it] + 34%|███▎ | 2487/7378 [8:32:14<16:46:39, 12.35s/it] + +{'loss': 0.497, 'learning_rate': 1.5450317103382695e-05, 'epoch': 0.34} + + 34%|███▎ | 2487/7378 [8:32:14<16:46:39, 12.35s/it] + 34%|███▎ | 2488/7378 [8:32:27<17:14:41, 12.70s/it] + +{'loss': 0.4612, 'learning_rate': 1.5446635807226445e-05, 'epoch': 0.34} + + 34%|███▎ | 2488/7378 [8:32:27<17:14:41, 12.70s/it] + 34%|███▎ | 2489/7378 [8:32:42<18:05:54, 13.33s/it] + +{'loss': 0.4251, 'learning_rate': 1.5442953461316504e-05, 'epoch': 0.34} + + 34%|███▎ | 2489/7378 [8:32:42<18:05:54, 13.33s/it] + 34%|███▎ | 2490/7378 [8:32:58<19:04:59, 14.05s/it] + +{'loss': 0.4573, 'learning_rate': 1.543927006636258e-05, 'epoch': 0.34} + + 34%|███▎ | 2490/7378 [8:32:58<19:04:59, 14.05s/it] + 34%|███▍ | 2491/7378 [8:33:13<19:43:53, 14.54s/it] + +{'loss': 0.5498, 'learning_rate': 1.5435585623074594e-05, 'epoch': 0.34} + + 34%|███▍ | 2491/7378 [8:33:13<19:43:53, 14.54s/it] + 34%|███▍ | 2492/7378 [8:33:32<21:32:06, 15.87s/it] + +{'loss': 0.5116, 'learning_rate': 1.5431900132162666e-05, 'epoch': 0.34} + + 34%|███▍ | 2492/7378 [8:33:32<21:32:06, 15.87s/it] + 34%|███▍ | 2493/7378 [8:33:44<19:57:46, 14.71s/it] + +{'loss': 0.4686, 'learning_rate': 1.542821359433711e-05, 'epoch': 0.34} + + 34%|███▍ | 2493/7378 [8:33:44<19:57:46, 14.71s/it] + 34%|███▍ | 2494/7378 [8:33:57<18:55:19, 13.95s/it] + +{'loss': 0.4074, 'learning_rate': 1.5424526010308458e-05, 'epoch': 0.34} + + 34%|███▍ | 2494/7378 [8:33:57<18:55:19, 13.95s/it] + 34%|███▍ | 2495/7378 [8:34:09<18:10:42, 13.40s/it] + +{'loss': 0.5285, 'learning_rate': 1.5420837380787427e-05, 'epoch': 0.34} + + 34%|███▍ | 2495/7378 [8:34:09<18:10:42, 13.40s/it] + 34%|███▍ | 2496/7378 [8:34:21<17:44:35, 13.08s/it] + +{'loss': 0.4729, 'learning_rate': 1.5417147706484944e-05, 'epoch': 0.34} + + 34%|███▍ | 2496/7378 [8:34:21<17:44:35, 13.08s/it] + 34%|███▍ | 2497/7378 [8:34:33<17:16:36, 12.74s/it] + +{'loss': 0.539, 'learning_rate': 1.5413456988112136e-05, 'epoch': 0.34} + + 34%|███▍ | 2497/7378 [8:34:33<17:16:36, 12.74s/it] + 34%|███▍ | 2498/7378 [8:34:45<17:04:37, 12.60s/it] + +{'loss': 0.524, 'learning_rate': 1.540976522638033e-05, 'epoch': 0.34} + + 34%|███▍ | 2498/7378 [8:34:45<17:04:37, 12.60s/it] + 34%|███▍ | 2499/7378 [8:34:58<16:58:56, 12.53s/it] + +{'loss': 0.4534, 'learning_rate': 1.5406072422001062e-05, 'epoch': 0.34} + + 34%|███▍ | 2499/7378 [8:34:58<16:58:56, 12.53s/it] + 34%|███▍ | 2500/7378 [8:35:10<16:48:02, 12.40s/it] + +{'loss': 0.5615, 'learning_rate': 1.5402378575686054e-05, 'epoch': 0.34} + + 34%|███▍ | 2500/7378 [8:35:10<16:48:02, 12.40s/it] + 34%|███▍ | 2501/7378 [8:35:22<16:42:29, 12.33s/it] + +{'loss': 0.452, 'learning_rate': 1.539868368814724e-05, 'epoch': 0.34} + + 34%|███▍ | 2501/7378 [8:35:22<16:42:29, 12.33s/it] + 34%|███▍ | 2502/7378 [8:35:34<16:39:43, 12.30s/it] + +{'loss': 0.4475, 'learning_rate': 1.539498776009675e-05, 'epoch': 0.34} + + 34%|███▍ | 2502/7378 [8:35:34<16:39:43, 12.30s/it] + 34%|███▍ | 2503/7378 [8:35:46<16:40:51, 12.32s/it] + +{'loss': 0.4738, 'learning_rate': 1.539129079224692e-05, 'epoch': 0.34} + + 34%|███▍ | 2503/7378 [8:35:46<16:40:51, 12.32s/it] + 34%|███▍ | 2504/7378 [8:35:59<16:36:45, 12.27s/it] + +{'loss': 0.4895, 'learning_rate': 1.538759278531028e-05, 'epoch': 0.34} + + 34%|███▍ | 2504/7378 [8:35:59<16:36:45, 12.27s/it] + 34%|███▍ | 2505/7378 [8:36:11<16:44:01, 12.36s/it] + +{'loss': 0.4271, 'learning_rate': 1.538389373999956e-05, 'epoch': 0.34} + + 34%|███▍ | 2505/7378 [8:36:11<16:44:01, 12.36s/it] + 34%|███▍ | 2506/7378 [8:36:23<16:38:44, 12.30s/it] + +{'loss': 0.4682, 'learning_rate': 1.5380193657027702e-05, 'epoch': 0.34} + + 34%|███▍ | 2506/7378 [8:36:23<16:38:44, 12.30s/it] + 34%|███▍ | 2507/7378 [8:36:36<16:58:31, 12.55s/it] + +{'loss': 0.5175, 'learning_rate': 1.5376492537107833e-05, 'epoch': 0.34} + + 34%|███▍ | 2507/7378 [8:36:37<16:58:31, 12.55s/it] + 34%|███▍ | 2508/7378 [8:36:49<16:49:40, 12.44s/it] + +{'loss': 0.4686, 'learning_rate': 1.537279038095328e-05, 'epoch': 0.34} + + 34%|███▍ | 2508/7378 [8:36:49<16:49:40, 12.44s/it] + 34%|███▍ | 2509/7378 [8:37:01<16:38:04, 12.30s/it] + +{'loss': 0.4636, 'learning_rate': 1.536908718927759e-05, 'epoch': 0.34} + + 34%|███▍ | 2509/7378 [8:37:01<16:38:04, 12.30s/it] + 34%|███▍ | 2510/7378 [8:37:14<16:53:01, 12.49s/it] + +{'loss': 0.4494, 'learning_rate': 1.536538296279448e-05, 'epoch': 0.34} + + 34%|███▍ | 2510/7378 [8:37:14<16:53:01, 12.49s/it] + 34%|███▍ | 2511/7378 [8:37:26<16:42:31, 12.36s/it] + +{'loss': 0.4554, 'learning_rate': 1.5361677702217895e-05, 'epoch': 0.34} + + 34%|███▍ | 2511/7378 [8:37:26<16:42:31, 12.36s/it] + 34%|███▍ | 2512/7378 [8:37:38<16:47:19, 12.42s/it] + +{'loss': 0.4866, 'learning_rate': 1.5357971408261954e-05, 'epoch': 0.34} + + 34%|███▍ | 2512/7378 [8:37:38<16:47:19, 12.42s/it] + 34%|███▍ | 2513/7378 [8:37:50<16:41:43, 12.35s/it] + +{'loss': 0.4749, 'learning_rate': 1.5354264081640997e-05, 'epoch': 0.34} + + 34%|███▍ | 2513/7378 [8:37:50<16:41:43, 12.35s/it] + 34%|███▍ | 2514/7378 [8:38:02<16:33:03, 12.25s/it] + +{'loss': 0.4804, 'learning_rate': 1.5350555723069545e-05, 'epoch': 0.34} + + 34%|███▍ | 2514/7378 [8:38:02<16:33:03, 12.25s/it] + 34%|███▍ | 2515/7378 [8:38:15<16:35:25, 12.28s/it] + +{'loss': 0.4885, 'learning_rate': 1.534684633326233e-05, 'epoch': 0.34} + + 34%|███▍ | 2515/7378 [8:38:15<16:35:25, 12.28s/it] + 34%|███▍ | 2516/7378 [8:38:27<16:33:39, 12.26s/it] + +{'loss': 0.455, 'learning_rate': 1.534313591293428e-05, 'epoch': 0.34} + + 34%|███▍ | 2516/7378 [8:38:27<16:33:39, 12.26s/it] + 34%|███▍ | 2517/7378 [8:38:39<16:31:59, 12.24s/it] + +{'loss': 0.5073, 'learning_rate': 1.533942446280052e-05, 'epoch': 0.34} + + 34%|███▍ | 2517/7378 [8:38:39<16:31:59, 12.24s/it] + 34%|███▍ | 2518/7378 [8:38:52<16:35:16, 12.29s/it] + +{'loss': 0.5058, 'learning_rate': 1.533571198357637e-05, 'epoch': 0.34} + + 34%|███▍ | 2518/7378 [8:38:52<16:35:16, 12.29s/it] + 34%|███▍ | 2519/7378 [8:39:04<16:43:34, 12.39s/it] + +{'loss': 0.3517, 'learning_rate': 1.5331998475977354e-05, 'epoch': 0.34} + + 34%|███▍ | 2519/7378 [8:39:04<16:43:34, 12.39s/it] + 34%|███▍ | 2520/7378 [8:39:17<16:45:02, 12.41s/it] + +{'loss': 0.4709, 'learning_rate': 1.5328283940719196e-05, 'epoch': 0.34} + + 34%|███▍ | 2520/7378 [8:39:17<16:45:02, 12.41s/it] + 34%|███▍ | 2521/7378 [8:39:29<16:40:12, 12.36s/it] + +{'loss': 0.4566, 'learning_rate': 1.532456837851781e-05, 'epoch': 0.34} + + 34%|███▍ | 2521/7378 [8:39:29<16:40:12, 12.36s/it] + 34%|███▍ | 2522/7378 [8:39:41<16:45:35, 12.42s/it] + +{'loss': 0.52, 'learning_rate': 1.5320851790089314e-05, 'epoch': 0.34} + + 34%|███▍ | 2522/7378 [8:39:41<16:45:35, 12.42s/it] + 34%|███▍ | 2523/7378 [8:39:54<16:38:24, 12.34s/it] + +{'loss': 0.5185, 'learning_rate': 1.5317134176150025e-05, 'epoch': 0.34} + + 34%|███▍ | 2523/7378 [8:39:54<16:38:24, 12.34s/it] + 34%|███▍ | 2524/7378 [8:40:06<16:42:06, 12.39s/it] + +{'loss': 0.4703, 'learning_rate': 1.5313415537416448e-05, 'epoch': 0.34} + + 34%|███▍ | 2524/7378 [8:40:06<16:42:06, 12.39s/it] + 34%|███▍ | 2525/7378 [8:40:18<16:34:33, 12.30s/it] + +{'loss': 0.4401, 'learning_rate': 1.53096958746053e-05, 'epoch': 0.34} + + 34%|███▍ | 2525/7378 [8:40:18<16:34:33, 12.30s/it] + 34%|███▍ | 2526/7378 [8:40:31<16:36:40, 12.32s/it] + +{'loss': 0.4621, 'learning_rate': 1.530597518843348e-05, 'epoch': 0.34} + + 34%|███▍ | 2526/7378 [8:40:31<16:36:40, 12.32s/it] + 34%|███▍ | 2527/7378 [8:40:44<16:50:30, 12.50s/it] + +{'loss': 0.4558, 'learning_rate': 1.5302253479618097e-05, 'epoch': 0.34} + + 34%|███▍ | 2527/7378 [8:40:44<16:50:30, 12.50s/it] + 34%|███▍ | 2528/7378 [8:40:56<16:46:57, 12.46s/it] + +{'loss': 0.4547, 'learning_rate': 1.5298530748876453e-05, 'epoch': 0.34} + + 34%|███▍ | 2528/7378 [8:40:56<16:46:57, 12.46s/it] + 34%|███▍ | 2529/7378 [8:41:08<16:40:18, 12.38s/it] + +{'loss': 0.4557, 'learning_rate': 1.5294806996926043e-05, 'epoch': 0.34} + + 34%|███▍ | 2529/7378 [8:41:08<16:40:18, 12.38s/it] + 34%|███▍ | 2530/7378 [8:41:20<16:37:01, 12.34s/it] + +{'loss': 0.4497, 'learning_rate': 1.5291082224484565e-05, 'epoch': 0.34} + + 34%|███▍ | 2530/7378 [8:41:20<16:37:01, 12.34s/it] + 34%|███▍ | 2531/7378 [8:41:33<16:46:48, 12.46s/it] + +{'loss': 0.5295, 'learning_rate': 1.5287356432269907e-05, 'epoch': 0.34} + + 34%|███▍ | 2531/7378 [8:41:33<16:46:48, 12.46s/it] + 34%|███▍ | 2532/7378 [8:41:45<16:37:21, 12.35s/it] + +{'loss': 0.4827, 'learning_rate': 1.528362962100016e-05, 'epoch': 0.34} + + 34%|███▍ | 2532/7378 [8:41:45<16:37:21, 12.35s/it] + 34%|███▍ | 2533/7378 [8:41:58<16:38:40, 12.37s/it] + +{'loss': 0.4455, 'learning_rate': 1.5279901791393605e-05, 'epoch': 0.34} + + 34%|███▍ | 2533/7378 [8:41:58<16:38:40, 12.37s/it] + 34%|███▍ | 2534/7378 [8:42:10<16:38:20, 12.37s/it] + +{'loss': 0.4841, 'learning_rate': 1.5276172944168725e-05, 'epoch': 0.34} + + 34%|███▍ | 2534/7378 [8:42:10<16:38:20, 12.37s/it] + 34%|███▍ | 2535/7378 [8:42:22<16:37:37, 12.36s/it] + +{'loss': 0.5186, 'learning_rate': 1.5272443080044194e-05, 'epoch': 0.34} + + 34%|███▍ | 2535/7378 [8:42:22<16:37:37, 12.36s/it] + 34%|███▍ | 2536/7378 [8:42:36<17:00:03, 12.64s/it] + +{'loss': 0.4175, 'learning_rate': 1.5268712199738895e-05, 'epoch': 0.34} + + 34%|███▍ | 2536/7378 [8:42:36<17:00:03, 12.64s/it] + 34%|███▍ | 2537/7378 [8:42:48<16:50:57, 12.53s/it] + +{'loss': 0.4723, 'learning_rate': 1.526498030397188e-05, 'epoch': 0.34} + + 34%|███▍ | 2537/7378 [8:42:48<16:50:57, 12.53s/it] + 34%|███▍ | 2538/7378 [8:43:00<16:41:42, 12.42s/it] + +{'loss': 0.5322, 'learning_rate': 1.5261247393462427e-05, 'epoch': 0.34} + + 34%|███▍ | 2538/7378 [8:43:00<16:41:42, 12.42s/it] + 34%|███▍ | 2539/7378 [8:43:12<16:35:04, 12.34s/it] + +{'loss': 0.5173, 'learning_rate': 1.5257513468929994e-05, 'epoch': 0.34} + + 34%|███▍ | 2539/7378 [8:43:12<16:35:04, 12.34s/it] + 34%|███▍ | 2540/7378 [8:43:24<16:29:23, 12.27s/it] + +{'loss': 0.4539, 'learning_rate': 1.525377853109423e-05, 'epoch': 0.34} + + 34%|███▍ | 2540/7378 [8:43:24<16:29:23, 12.27s/it] + 34%|███▍ | 2541/7378 [8:43:36<16:28:31, 12.26s/it] + +{'loss': 0.5005, 'learning_rate': 1.5250042580674993e-05, 'epoch': 0.34} + + 34%|███▍ | 2541/7378 [8:43:36<16:28:31, 12.26s/it] + 34%|███▍ | 2542/7378 [8:43:49<16:29:43, 12.28s/it] + +{'loss': 0.4688, 'learning_rate': 1.5246305618392323e-05, 'epoch': 0.34} + + 34%|███▍ | 2542/7378 [8:43:49<16:29:43, 12.28s/it] + 34%|███▍ | 2543/7378 [8:44:01<16:32:13, 12.31s/it] + +{'loss': 0.4499, 'learning_rate': 1.5242567644966463e-05, 'epoch': 0.34} + + 34%|███▍ | 2543/7378 [8:44:01<16:32:13, 12.31s/it] + 34%|███▍ | 2544/7378 [8:44:13<16:30:58, 12.30s/it] + +{'loss': 0.3744, 'learning_rate': 1.5238828661117856e-05, 'epoch': 0.34} + + 34%|███▍ | 2544/7378 [8:44:13<16:30:58, 12.30s/it] + 34%|███▍ | 2545/7378 [8:44:26<16:25:01, 12.23s/it] + +{'loss': 0.4768, 'learning_rate': 1.5235088667567118e-05, 'epoch': 0.34} + + 34%|███▍ | 2545/7378 [8:44:26<16:25:01, 12.23s/it] + 35%|███▍ | 2546/7378 [8:44:38<16:38:38, 12.40s/it] + +{'loss': 0.5006, 'learning_rate': 1.5231347665035084e-05, 'epoch': 0.35} + + 35%|███▍ | 2546/7378 [8:44:38<16:38:38, 12.40s/it] + 35%|███▍ | 2547/7378 [8:44:50<16:30:33, 12.30s/it] + +{'loss': 0.4615, 'learning_rate': 1.5227605654242772e-05, 'epoch': 0.35} + + 35%|███▍ | 2547/7378 [8:44:50<16:30:33, 12.30s/it] + 35%|███▍ | 2548/7378 [8:45:03<16:28:04, 12.27s/it] + +{'loss': 0.5246, 'learning_rate': 1.5223862635911396e-05, 'epoch': 0.35} + + 35%|███▍ | 2548/7378 [8:45:03<16:28:04, 12.27s/it] + 35%|███▍ | 2549/7378 [8:45:15<16:24:21, 12.23s/it] + +{'loss': 0.4529, 'learning_rate': 1.5220118610762362e-05, 'epoch': 0.35} + + 35%|███▍ | 2549/7378 [8:45:15<16:24:21, 12.23s/it] + 35%|███▍ | 2550/7378 [8:45:27<16:24:28, 12.23s/it] + +{'loss': 0.4699, 'learning_rate': 1.5216373579517276e-05, 'epoch': 0.35} + + 35%|███▍ | 2550/7378 [8:45:27<16:24:28, 12.23s/it] + 35%|███▍ | 2551/7378 [8:45:39<16:18:54, 12.17s/it] + +{'loss': 0.4293, 'learning_rate': 1.5212627542897934e-05, 'epoch': 0.35} + + 35%|███▍ | 2551/7378 [8:45:39<16:18:54, 12.17s/it] + 35%|███▍ | 2552/7378 [8:45:51<16:21:03, 12.20s/it] + +{'loss': 0.4784, 'learning_rate': 1.520888050162632e-05, 'epoch': 0.35} + + 35%|███▍ | 2552/7378 [8:45:51<16:21:03, 12.20s/it] + 35%|███▍ | 2553/7378 [8:46:03<16:20:51, 12.20s/it] + +{'loss': 0.5171, 'learning_rate': 1.5205132456424622e-05, 'epoch': 0.35} + + 35%|███▍ | 2553/7378 [8:46:03<16:20:51, 12.20s/it] + 35%|███▍ | 2554/7378 [8:46:16<16:17:57, 12.16s/it] + +{'loss': 0.4474, 'learning_rate': 1.5201383408015215e-05, 'epoch': 0.35} + + 35%|███▍ | 2554/7378 [8:46:16<16:17:57, 12.16s/it] + 35%|███▍ | 2555/7378 [8:46:28<16:25:45, 12.26s/it] + +{'loss': 0.4321, 'learning_rate': 1.5197633357120673e-05, 'epoch': 0.35} + + 35%|███▍ | 2555/7378 [8:46:28<16:25:45, 12.26s/it] + 35%|███▍ | 2556/7378 [8:46:40<16:16:06, 12.15s/it] + +{'loss': 0.4573, 'learning_rate': 1.5193882304463756e-05, 'epoch': 0.35} + + 35%|███▍ | 2556/7378 [8:46:40<16:16:06, 12.15s/it] + 35%|███▍ | 2557/7378 [8:46:52<16:15:23, 12.14s/it] + +{'loss': 0.4047, 'learning_rate': 1.5190130250767425e-05, 'epoch': 0.35} + + 35%|███▍ | 2557/7378 [8:46:52<16:15:23, 12.14s/it] + 35%|███▍ | 2558/7378 [8:47:04<16:18:38, 12.18s/it] + +{'loss': 0.4799, 'learning_rate': 1.5186377196754825e-05, 'epoch': 0.35} + + 35%|███▍ | 2558/7378 [8:47:04<16:18:38, 12.18s/it] + 35%|███▍ | 2559/7378 [8:47:16<16:11:43, 12.10s/it] + +{'loss': 0.4855, 'learning_rate': 1.5182623143149297e-05, 'epoch': 0.35} + + 35%|███▍ | 2559/7378 [8:47:16<16:11:43, 12.10s/it] + 35%|███▍ | 2560/7378 [8:47:28<16:10:57, 12.09s/it] + +{'loss': 0.4507, 'learning_rate': 1.5178868090674381e-05, 'epoch': 0.35} + + 35%|███▍ | 2560/7378 [8:47:28<16:10:57, 12.09s/it] + 35%|███▍ | 2561/7378 [8:47:41<16:25:36, 12.28s/it] + +{'loss': 0.4258, 'learning_rate': 1.51751120400538e-05, 'epoch': 0.35} + + 35%|███▍ | 2561/7378 [8:47:41<16:25:36, 12.28s/it] + 35%|███▍ | 2562/7378 [8:47:53<16:21:05, 12.22s/it] + +{'loss': 0.496, 'learning_rate': 1.5171354992011478e-05, 'epoch': 0.35} + + 35%|███▍ | 2562/7378 [8:47:53<16:21:05, 12.22s/it] + 35%|███▍ | 2563/7378 [8:48:05<16:21:55, 12.24s/it] + +{'loss': 0.474, 'learning_rate': 1.5167596947271523e-05, 'epoch': 0.35} + + 35%|███▍ | 2563/7378 [8:48:05<16:21:55, 12.24s/it] + 35%|███▍ | 2564/7378 [8:48:18<16:24:28, 12.27s/it] + +{'loss': 0.51, 'learning_rate': 1.5163837906558243e-05, 'epoch': 0.35} + + 35%|███▍ | 2564/7378 [8:48:18<16:24:28, 12.27s/it] + 35%|███▍ | 2565/7378 [8:48:30<16:27:04, 12.31s/it] + +{'loss': 0.4898, 'learning_rate': 1.5160077870596133e-05, 'epoch': 0.35} + + 35%|███▍ | 2565/7378 [8:48:30<16:27:04, 12.31s/it] + 35%|███▍ | 2566/7378 [8:48:43<16:44:26, 12.52s/it] + +{'loss': 0.4918, 'learning_rate': 1.515631684010988e-05, 'epoch': 0.35} + + 35%|███▍ | 2566/7378 [8:48:43<16:44:26, 12.52s/it] + 35%|███▍ | 2567/7378 [8:48:56<16:42:32, 12.50s/it] + +{'loss': 0.4413, 'learning_rate': 1.5152554815824366e-05, 'epoch': 0.35} + + 35%|███▍ | 2567/7378 [8:48:56<16:42:32, 12.50s/it] + 35%|███▍ | 2568/7378 [8:49:08<16:47:15, 12.56s/it] + +{'loss': 0.4653, 'learning_rate': 1.5148791798464654e-05, 'epoch': 0.35} + + 35%|███▍ | 2568/7378 [8:49:08<16:47:15, 12.56s/it] + 35%|███▍ | 2569/7378 [8:49:21<16:38:22, 12.46s/it] + +{'loss': 0.487, 'learning_rate': 1.5145027788756022e-05, 'epoch': 0.35} + + 35%|███▍ | 2569/7378 [8:49:21<16:38:22, 12.46s/it] + 35%|███▍ | 2570/7378 [8:49:33<16:29:31, 12.35s/it] + +{'loss': 0.4532, 'learning_rate': 1.514126278742391e-05, 'epoch': 0.35} + + 35%|███▍ | 2570/7378 [8:49:33<16:29:31, 12.35s/it] + 35%|███▍ | 2571/7378 [8:49:45<16:32:38, 12.39s/it] + +{'loss': 0.4502, 'learning_rate': 1.513749679519397e-05, 'epoch': 0.35} + + 35%|███▍ | 2571/7378 [8:49:45<16:32:38, 12.39s/it] + 35%|███▍ | 2572/7378 [8:49:58<16:35:43, 12.43s/it] + +{'loss': 0.4576, 'learning_rate': 1.5133729812792035e-05, 'epoch': 0.35} + + 35%|███▍ | 2572/7378 [8:49:58<16:35:43, 12.43s/it] + 35%|███▍ | 2573/7378 [8:50:10<16:29:48, 12.36s/it] + +{'loss': 0.5212, 'learning_rate': 1.5129961840944131e-05, 'epoch': 0.35} + + 35%|███▍ | 2573/7378 [8:50:10<16:29:48, 12.36s/it] + 35%|███▍ | 2574/7378 [8:50:22<16:19:58, 12.24s/it] + +{'loss': 0.3799, 'learning_rate': 1.512619288037648e-05, 'epoch': 0.35} + + 35%|███▍ | 2574/7378 [8:50:22<16:19:58, 12.24s/it] + 35%|███▍ | 2575/7378 [8:50:34<16:22:21, 12.27s/it] + +{'loss': 0.5039, 'learning_rate': 1.5122422931815487e-05, 'epoch': 0.35} + + 35%|███▍ | 2575/7378 [8:50:34<16:22:21, 12.27s/it] + 35%|███▍ | 2576/7378 [8:50:46<16:17:45, 12.22s/it] + +{'loss': 0.4662, 'learning_rate': 1.5118651995987752e-05, 'epoch': 0.35} + + 35%|███▍ | 2576/7378 [8:50:46<16:17:45, 12.22s/it] + 35%|███▍ | 2577/7378 [8:50:58<16:13:47, 12.17s/it] + +{'loss': 0.4798, 'learning_rate': 1.511488007362006e-05, 'epoch': 0.35} + + 35%|███▍ | 2577/7378 [8:50:58<16:13:47, 12.17s/it] + 35%|███▍ | 2578/7378 [8:51:10<16:11:25, 12.14s/it] + +{'loss': 0.4984, 'learning_rate': 1.5111107165439393e-05, 'epoch': 0.35} + + 35%|███▍ | 2578/7378 [8:51:10<16:11:25, 12.14s/it] + 35%|███▍ | 2579/7378 [8:51:23<16:22:06, 12.28s/it] + +{'loss': 0.4285, 'learning_rate': 1.5107333272172922e-05, 'epoch': 0.35} + + 35%|███▍ | 2579/7378 [8:51:23<16:22:06, 12.28s/it] + 35%|███▍ | 2580/7378 [8:51:35<16:25:28, 12.32s/it] + +{'loss': 0.4643, 'learning_rate': 1.5103558394548002e-05, 'epoch': 0.35} + + 35%|███▍ | 2580/7378 [8:51:35<16:25:28, 12.32s/it] + 35%|███▍ | 2581/7378 [8:51:48<16:33:15, 12.42s/it] + +{'loss': 0.4761, 'learning_rate': 1.5099782533292184e-05, 'epoch': 0.35} + + 35%|███▍ | 2581/7378 [8:51:48<16:33:15, 12.42s/it] + 35%|███▍ | 2582/7378 [8:52:01<16:38:14, 12.49s/it] + +{'loss': 0.4924, 'learning_rate': 1.5096005689133203e-05, 'epoch': 0.35} + + 35%|███▍ | 2582/7378 [8:52:01<16:38:14, 12.49s/it] + 35%|███▌ | 2583/7378 [8:52:13<16:25:44, 12.33s/it] + +{'loss': 0.471, 'learning_rate': 1.509222786279899e-05, 'epoch': 0.35} + + 35%|███▌ | 2583/7378 [8:52:13<16:25:44, 12.33s/it] + 35%|███▌ | 2584/7378 [8:52:25<16:22:00, 12.29s/it] + +{'loss': 0.5335, 'learning_rate': 1.508844905501766e-05, 'epoch': 0.35} + + 35%|███▌ | 2584/7378 [8:52:25<16:22:00, 12.29s/it] + 35%|███▌ | 2585/7378 [8:52:37<16:27:20, 12.36s/it] + +{'loss': 0.5593, 'learning_rate': 1.5084669266517518e-05, 'epoch': 0.35} + + 35%|███▌ | 2585/7378 [8:52:37<16:27:20, 12.36s/it] + 35%|███▌ | 2586/7378 [8:52:50<16:33:35, 12.44s/it] + +{'loss': 0.4493, 'learning_rate': 1.508088849802706e-05, 'epoch': 0.35} + + 35%|███▌ | 2586/7378 [8:52:50<16:33:35, 12.44s/it] + 35%|███▌ | 2587/7378 [8:53:02<16:28:46, 12.38s/it] + +{'loss': 0.5385, 'learning_rate': 1.5077106750274972e-05, 'epoch': 0.35} + + 35%|███▌ | 2587/7378 [8:53:02<16:28:46, 12.38s/it] + 35%|███▌ | 2588/7378 [8:53:15<16:30:47, 12.41s/it] + +{'loss': 0.4396, 'learning_rate': 1.5073324023990124e-05, 'epoch': 0.35} + + 35%|███▌ | 2588/7378 [8:53:15<16:30:47, 12.41s/it] + 35%|███▌ | 2589/7378 [8:53:27<16:24:24, 12.33s/it] + +{'loss': 0.3985, 'learning_rate': 1.5069540319901577e-05, 'epoch': 0.35} + + 35%|███▌ | 2589/7378 [8:53:27<16:24:24, 12.33s/it] + 35%|███▌ | 2590/7378 [8:53:39<16:17:11, 12.25s/it] + +{'loss': 0.4414, 'learning_rate': 1.5065755638738581e-05, 'epoch': 0.35} + + 35%|███▌ | 2590/7378 [8:53:39<16:17:11, 12.25s/it] + 35%|███▌ | 2591/7378 [8:53:51<16:18:31, 12.26s/it] + +{'loss': 0.4312, 'learning_rate': 1.5061969981230577e-05, 'epoch': 0.35} + + 35%|███▌ | 2591/7378 [8:53:51<16:18:31, 12.26s/it] + 35%|███▌ | 2592/7378 [8:54:03<16:15:46, 12.23s/it] + +{'loss': 0.5048, 'learning_rate': 1.505818334810719e-05, 'epoch': 0.35} + + 35%|███▌ | 2592/7378 [8:54:03<16:15:46, 12.23s/it] + 35%|███▌ | 2593/7378 [8:54:16<16:16:15, 12.24s/it] + +{'loss': 0.4393, 'learning_rate': 1.5054395740098228e-05, 'epoch': 0.35} + + 35%|███▌ | 2593/7378 [8:54:16<16:16:15, 12.24s/it] + 35%|███▌ | 2594/7378 [8:54:28<16:18:28, 12.27s/it] + +{'loss': 0.4928, 'learning_rate': 1.5050607157933703e-05, 'epoch': 0.35} + + 35%|███▌ | 2594/7378 [8:54:28<16:18:28, 12.27s/it] + 35%|███▌ | 2595/7378 [8:54:40<16:07:51, 12.14s/it] + +{'loss': 0.5044, 'learning_rate': 1.5046817602343797e-05, 'epoch': 0.35} + + 35%|███▌ | 2595/7378 [8:54:40<16:07:51, 12.14s/it] + 35%|███▌ | 2596/7378 [8:54:52<16:14:58, 12.23s/it] + +{'loss': 0.4511, 'learning_rate': 1.5043027074058891e-05, 'epoch': 0.35} + + 35%|███▌ | 2596/7378 [8:54:52<16:14:58, 12.23s/it] + 35%|███▌ | 2597/7378 [8:55:05<16:21:27, 12.32s/it] + +{'loss': 0.5082, 'learning_rate': 1.503923557380955e-05, 'epoch': 0.35} + + 35%|███▌ | 2597/7378 [8:55:05<16:21:27, 12.32s/it] + 35%|███▌ | 2598/7378 [8:55:17<16:21:50, 12.32s/it] + +{'loss': 0.4936, 'learning_rate': 1.5035443102326523e-05, 'epoch': 0.35} + + 35%|███▌ | 2598/7378 [8:55:17<16:21:50, 12.32s/it] + 35%|███▌ | 2599/7378 [8:55:29<16:20:14, 12.31s/it] + +{'loss': 0.5055, 'learning_rate': 1.5031649660340754e-05, 'epoch': 0.35} + + 35%|███▌ | 2599/7378 [8:55:29<16:20:14, 12.31s/it] + 35%|███▌ | 2600/7378 [8:55:42<16:20:33, 12.31s/it] + +{'loss': 0.4826, 'learning_rate': 1.5027855248583368e-05, 'epoch': 0.35} + + 35%|███▌ | 2600/7378 [8:55:42<16:20:33, 12.31s/it] + 35%|███▌ | 2601/7378 [8:55:54<16:20:09, 12.31s/it] + +{'loss': 0.4686, 'learning_rate': 1.5024059867785678e-05, 'epoch': 0.35} + + 35%|███▌ | 2601/7378 [8:55:54<16:20:09, 12.31s/it] + 35%|███▌ | 2602/7378 [8:56:07<16:28:19, 12.42s/it] + +{'loss': 0.5018, 'learning_rate': 1.5020263518679183e-05, 'epoch': 0.35} + + 35%|███▌ | 2602/7378 [8:56:07<16:28:19, 12.42s/it] + 35%|███▌ | 2603/7378 [8:56:19<16:20:12, 12.32s/it] + +{'loss': 0.468, 'learning_rate': 1.5016466201995572e-05, 'epoch': 0.35} + + 35%|███▌ | 2603/7378 [8:56:19<16:20:12, 12.32s/it] + 35%|███▌ | 2604/7378 [8:56:31<16:12:14, 12.22s/it] + +{'loss': 0.4648, 'learning_rate': 1.5012667918466716e-05, 'epoch': 0.35} + + 35%|███▌ | 2604/7378 [8:56:31<16:12:14, 12.22s/it] + 35%|███▌ | 2605/7378 [8:56:43<16:15:39, 12.26s/it] + +{'loss': 0.5124, 'learning_rate': 1.5008868668824676e-05, 'epoch': 0.35} + + 35%|███▌ | 2605/7378 [8:56:43<16:15:39, 12.26s/it] + 35%|███▌ | 2606/7378 [8:56:55<16:17:04, 12.29s/it] + +{'loss': 0.5036, 'learning_rate': 1.5005068453801697e-05, 'epoch': 0.35} + + 35%|███▌ | 2606/7378 [8:56:55<16:17:04, 12.29s/it] + 35%|███▌ | 2607/7378 [8:57:07<16:06:22, 12.15s/it] + +{'loss': 0.5045, 'learning_rate': 1.500126727413021e-05, 'epoch': 0.35} + + 35%|███▌ | 2607/7378 [8:57:07<16:06:22, 12.15s/it] + 35%|███▌ | 2608/7378 [8:57:19<15:58:59, 12.06s/it] + +{'loss': 0.4611, 'learning_rate': 1.4997465130542838e-05, 'epoch': 0.35} + + 35%|███▌ | 2608/7378 [8:57:19<15:58:59, 12.06s/it] + 35%|███▌ | 2609/7378 [8:57:32<16:10:55, 12.22s/it] + +{'loss': 0.4773, 'learning_rate': 1.4993662023772379e-05, 'epoch': 0.35} + + 35%|███▌ | 2609/7378 [8:57:32<16:10:55, 12.22s/it] + 35%|███▌ | 2610/7378 [8:57:44<16:14:36, 12.26s/it] + +{'loss': 0.4917, 'learning_rate': 1.4989857954551826e-05, 'epoch': 0.35} + + 35%|███▌ | 2610/7378 [8:57:44<16:14:36, 12.26s/it] + 35%|███▌ | 2611/7378 [8:57:56<16:16:51, 12.30s/it] + +{'loss': 0.4795, 'learning_rate': 1.4986052923614347e-05, 'epoch': 0.35} + + 35%|███▌ | 2611/7378 [8:57:56<16:16:51, 12.30s/it] + 35%|███▌ | 2612/7378 [8:58:09<16:12:56, 12.25s/it] + +{'loss': 0.456, 'learning_rate': 1.4982246931693309e-05, 'epoch': 0.35} + + 35%|███▌ | 2612/7378 [8:58:09<16:12:56, 12.25s/it] + 35%|███▌ | 2613/7378 [8:58:22<16:30:42, 12.47s/it] + +{'loss': 0.4718, 'learning_rate': 1.4978439979522255e-05, 'epoch': 0.35} + + 35%|███▌ | 2613/7378 [8:58:22<16:30:42, 12.47s/it] + 35%|███▌ | 2614/7378 [8:58:34<16:34:40, 12.53s/it] + +{'loss': 0.4922, 'learning_rate': 1.4974632067834918e-05, 'epoch': 0.35} + + 35%|███▌ | 2614/7378 [8:58:34<16:34:40, 12.53s/it] + 35%|███▌ | 2615/7378 [8:58:46<16:23:11, 12.39s/it] + +{'loss': 0.5003, 'learning_rate': 1.4970823197365208e-05, 'epoch': 0.35} + + 35%|███▌ | 2615/7378 [8:58:46<16:23:11, 12.39s/it] + 35%|███▌ | 2616/7378 [8:58:59<16:27:30, 12.44s/it] + +{'loss': 0.3992, 'learning_rate': 1.4967013368847229e-05, 'epoch': 0.35} + + 35%|███▌ | 2616/7378 [8:58:59<16:27:30, 12.44s/it] + 35%|███▌ | 2617/7378 [8:59:11<16:29:01, 12.46s/it] + +{'loss': 0.5093, 'learning_rate': 1.4963202583015264e-05, 'epoch': 0.35} + + 35%|███▌ | 2617/7378 [8:59:11<16:29:01, 12.46s/it] + 35%|███▌ | 2618/7378 [8:59:24<16:23:52, 12.40s/it] + +{'loss': 0.4374, 'learning_rate': 1.4959390840603786e-05, 'epoch': 0.35} + + 35%|███▌ | 2618/7378 [8:59:24<16:23:52, 12.40s/it] + 35%|███▌ | 2619/7378 [8:59:36<16:26:59, 12.44s/it] + +{'loss': 0.5006, 'learning_rate': 1.4955578142347442e-05, 'epoch': 0.35} + + 35%|███▌ | 2619/7378 [8:59:36<16:26:59, 12.44s/it] + 36%|███▌ | 2620/7378 [8:59:49<16:25:00, 12.42s/it] + +{'loss': 0.4385, 'learning_rate': 1.4951764488981075e-05, 'epoch': 0.36} + + 36%|███▌ | 2620/7378 [8:59:49<16:25:00, 12.42s/it] + 36%|███▌ | 2621/7378 [9:00:01<16:18:45, 12.35s/it] + +{'loss': 0.5175, 'learning_rate': 1.4947949881239706e-05, 'epoch': 0.36} + + 36%|███▌ | 2621/7378 [9:00:01<16:18:45, 12.35s/it] + 36%|███▌ | 2622/7378 [9:00:13<16:20:37, 12.37s/it] + +{'loss': 0.4909, 'learning_rate': 1.494413431985854e-05, 'epoch': 0.36} + + 36%|███▌ | 2622/7378 [9:00:13<16:20:37, 12.37s/it] + 36%|███▌ | 2623/7378 [9:00:26<16:31:14, 12.51s/it] + +{'loss': 0.4255, 'learning_rate': 1.4940317805572964e-05, 'epoch': 0.36} + + 36%|███▌ | 2623/7378 [9:00:26<16:31:14, 12.51s/it] + 36%|███▌ | 2624/7378 [9:00:38<16:19:05, 12.36s/it] + +{'loss': 0.4535, 'learning_rate': 1.4936500339118556e-05, 'epoch': 0.36} + + 36%|███▌ | 2624/7378 [9:00:38<16:19:05, 12.36s/it] + 36%|███▌ | 2625/7378 [9:00:50<16:14:52, 12.31s/it] + +{'loss': 0.4086, 'learning_rate': 1.4932681921231072e-05, 'epoch': 0.36} + + 36%|███▌ | 2625/7378 [9:00:50<16:14:52, 12.31s/it] + 36%|███▌ | 2626/7378 [9:01:03<16:24:02, 12.42s/it] + +{'loss': 0.4688, 'learning_rate': 1.4928862552646448e-05, 'epoch': 0.36} + + 36%|███▌ | 2626/7378 [9:01:03<16:24:02, 12.42s/it] + 36%|███▌ | 2627/7378 [9:01:15<16:25:35, 12.45s/it] + +{'loss': 0.4989, 'learning_rate': 1.4925042234100815e-05, 'epoch': 0.36} + + 36%|███▌ | 2627/7378 [9:01:15<16:25:35, 12.45s/it] + 36%|███▌ | 2628/7378 [9:01:28<16:26:28, 12.46s/it] + +{'loss': 0.4835, 'learning_rate': 1.4921220966330472e-05, 'epoch': 0.36} + + 36%|███▌ | 2628/7378 [9:01:28<16:26:28, 12.46s/it] + 36%|███▌ | 2629/7378 [9:01:40<16:18:04, 12.36s/it] + +{'loss': 0.5382, 'learning_rate': 1.4917398750071912e-05, 'epoch': 0.36} + + 36%|███▌ | 2629/7378 [9:01:40<16:18:04, 12.36s/it] + 36%|███▌ | 2630/7378 [9:01:52<16:10:25, 12.26s/it] + +{'loss': 0.464, 'learning_rate': 1.4913575586061809e-05, 'epoch': 0.36} + + 36%|███▌ | 2630/7378 [9:01:52<16:10:25, 12.26s/it] + 36%|███▌ | 2631/7378 [9:02:05<16:16:55, 12.35s/it] + +{'loss': 0.4927, 'learning_rate': 1.4909751475037014e-05, 'epoch': 0.36} + + 36%|███▌ | 2631/7378 [9:02:05<16:16:55, 12.35s/it] + 36%|███▌ | 2632/7378 [9:02:17<16:12:50, 12.30s/it] + +{'loss': 0.5341, 'learning_rate': 1.4905926417734566e-05, 'epoch': 0.36} + + 36%|███▌ | 2632/7378 [9:02:17<16:12:50, 12.30s/it] + 36%|███▌ | 2633/7378 [9:02:29<16:09:00, 12.25s/it] + +{'loss': 0.3965, 'learning_rate': 1.4902100414891685e-05, 'epoch': 0.36} + + 36%|███▌ | 2633/7378 [9:02:29<16:09:00, 12.25s/it] + 36%|███▌ | 2634/7378 [9:02:41<16:10:26, 12.27s/it] + +{'loss': 0.4173, 'learning_rate': 1.4898273467245775e-05, 'epoch': 0.36} + + 36%|███▌ | 2634/7378 [9:02:41<16:10:26, 12.27s/it] + 36%|███▌ | 2635/7378 [9:02:53<16:08:31, 12.25s/it] + +{'loss': 0.476, 'learning_rate': 1.4894445575534418e-05, 'epoch': 0.36} + + 36%|███▌ | 2635/7378 [9:02:53<16:08:31, 12.25s/it] + 36%|███▌ | 2636/7378 [9:03:05<16:01:28, 12.17s/it] + +{'loss': 0.4272, 'learning_rate': 1.4890616740495379e-05, 'epoch': 0.36} + + 36%|███▌ | 2636/7378 [9:03:05<16:01:28, 12.17s/it] + 36%|███▌ | 2637/7378 [9:03:17<15:56:20, 12.10s/it] + +{'loss': 0.4472, 'learning_rate': 1.4886786962866608e-05, 'epoch': 0.36} + + 36%|███▌ | 2637/7378 [9:03:17<15:56:20, 12.10s/it] + 36%|███▌ | 2638/7378 [9:03:30<15:58:43, 12.14s/it] + +{'loss': 0.4657, 'learning_rate': 1.4882956243386233e-05, 'epoch': 0.36} + + 36%|███▌ | 2638/7378 [9:03:30<15:58:43, 12.14s/it] + 36%|███▌ | 2639/7378 [9:03:42<16:05:39, 12.23s/it] + +{'loss': 0.4765, 'learning_rate': 1.487912458279257e-05, 'epoch': 0.36} + + 36%|███▌ | 2639/7378 [9:03:42<16:05:39, 12.23s/it] + 36%|███▌ | 2640/7378 [9:03:55<16:15:44, 12.36s/it] + +{'loss': 0.5175, 'learning_rate': 1.4875291981824106e-05, 'epoch': 0.36} + + 36%|███▌ | 2640/7378 [9:03:55<16:15:44, 12.36s/it] + 36%|███▌ | 2641/7378 [9:04:07<16:07:51, 12.26s/it] + +{'loss': 0.4624, 'learning_rate': 1.4871458441219515e-05, 'epoch': 0.36} + + 36%|███▌ | 2641/7378 [9:04:07<16:07:51, 12.26s/it] + 36%|███▌ | 2642/7378 [9:04:19<16:06:11, 12.24s/it] + +{'loss': 0.4604, 'learning_rate': 1.4867623961717653e-05, 'epoch': 0.36} + + 36%|███▌ | 2642/7378 [9:04:19<16:06:11, 12.24s/it] + 36%|███▌ | 2643/7378 [9:04:31<16:01:25, 12.18s/it] + +{'loss': 0.5205, 'learning_rate': 1.486378854405756e-05, 'epoch': 0.36} + + 36%|███▌ | 2643/7378 [9:04:31<16:01:25, 12.18s/it] + 36%|███▌ | 2644/7378 [9:04:43<16:08:56, 12.28s/it] + +{'loss': 0.5014, 'learning_rate': 1.4859952188978448e-05, 'epoch': 0.36} + + 36%|███▌ | 2644/7378 [9:04:43<16:08:56, 12.28s/it] + 36%|███▌ | 2645/7378 [9:04:56<16:13:28, 12.34s/it] + +{'loss': 0.4774, 'learning_rate': 1.4856114897219714e-05, 'epoch': 0.36} + + 36%|███▌ | 2645/7378 [9:04:56<16:13:28, 12.34s/it] + 36%|███▌ | 2646/7378 [9:05:08<16:13:17, 12.34s/it] + +{'loss': 0.5086, 'learning_rate': 1.4852276669520938e-05, 'epoch': 0.36} + + 36%|███▌ | 2646/7378 [9:05:08<16:13:17, 12.34s/it] + 36%|███▌ | 2647/7378 [9:05:21<16:15:52, 12.38s/it] + +{'loss': 0.4764, 'learning_rate': 1.4848437506621876e-05, 'epoch': 0.36} + + 36%|███▌ | 2647/7378 [9:05:21<16:15:52, 12.38s/it] + 36%|███▌ | 2648/7378 [9:05:33<16:15:55, 12.38s/it] + +{'loss': 0.4992, 'learning_rate': 1.484459740926247e-05, 'epoch': 0.36} + + 36%|███▌ | 2648/7378 [9:05:33<16:15:55, 12.38s/it] + 36%|███▌ | 2649/7378 [9:05:46<16:15:37, 12.38s/it] + +{'loss': 0.5271, 'learning_rate': 1.4840756378182833e-05, 'epoch': 0.36} + + 36%|███▌ | 2649/7378 [9:05:46<16:15:37, 12.38s/it] + 36%|███▌ | 2650/7378 [9:05:58<16:17:55, 12.41s/it] + +{'loss': 0.4864, 'learning_rate': 1.4836914414123271e-05, 'epoch': 0.36} + + 36%|███▌ | 2650/7378 [9:05:58<16:17:55, 12.41s/it] + 36%|███▌ | 2651/7378 [9:06:10<16:13:16, 12.35s/it] + +{'loss': 0.4632, 'learning_rate': 1.4833071517824254e-05, 'epoch': 0.36} + + 36%|███▌ | 2651/7378 [9:06:10<16:13:16, 12.35s/it] + 36%|███▌ | 2652/7378 [9:06:22<16:10:13, 12.32s/it] + +{'loss': 0.4271, 'learning_rate': 1.4829227690026448e-05, 'epoch': 0.36} + + 36%|███▌ | 2652/7378 [9:06:22<16:10:13, 12.32s/it] + 36%|███▌ | 2653/7378 [9:06:35<16:12:54, 12.35s/it] + +{'loss': 0.4559, 'learning_rate': 1.4825382931470686e-05, 'epoch': 0.36} + + 36%|███▌ | 2653/7378 [9:06:35<16:12:54, 12.35s/it] + 36%|███▌ | 2654/7378 [9:06:47<16:11:27, 12.34s/it] + +{'loss': 0.4723, 'learning_rate': 1.4821537242897985e-05, 'epoch': 0.36} + + 36%|███▌ | 2654/7378 [9:06:47<16:11:27, 12.34s/it] + 36%|███▌ | 2655/7378 [9:06:59<16:09:48, 12.32s/it] + +{'loss': 0.4554, 'learning_rate': 1.4817690625049542e-05, 'epoch': 0.36} + + 36%|███▌ | 2655/7378 [9:06:59<16:09:48, 12.32s/it] + 36%|███▌ | 2656/7378 [9:07:12<16:08:06, 12.30s/it] + +{'loss': 0.4384, 'learning_rate': 1.4813843078666733e-05, 'epoch': 0.36} + + 36%|███▌ | 2656/7378 [9:07:12<16:08:06, 12.30s/it] + 36%|███▌ | 2657/7378 [9:07:24<16:12:51, 12.36s/it] + +{'loss': 0.5035, 'learning_rate': 1.4809994604491111e-05, 'epoch': 0.36} + + 36%|███▌ | 2657/7378 [9:07:24<16:12:51, 12.36s/it] + 36%|███▌ | 2658/7378 [9:07:36<16:04:53, 12.27s/it] + +{'loss': 0.4828, 'learning_rate': 1.4806145203264413e-05, 'epoch': 0.36} + + 36%|███▌ | 2658/7378 [9:07:36<16:04:53, 12.27s/it] + 36%|███▌ | 2659/7378 [9:07:49<16:17:24, 12.43s/it] + +{'loss': 0.4012, 'learning_rate': 1.4802294875728542e-05, 'epoch': 0.36} + + 36%|███▌ | 2659/7378 [9:07:49<16:17:24, 12.43s/it] + 36%|███▌ | 2660/7378 [9:08:01<16:15:25, 12.40s/it] + +{'loss': 0.4427, 'learning_rate': 1.4798443622625598e-05, 'epoch': 0.36} + + 36%|███▌ | 2660/7378 [9:08:01<16:15:25, 12.40s/it] + 36%|███▌ | 2661/7378 [9:08:14<16:19:28, 12.46s/it] + +{'loss': 0.4691, 'learning_rate': 1.4794591444697843e-05, 'epoch': 0.36} + + 36%|███▌ | 2661/7378 [9:08:14<16:19:28, 12.46s/it] + 36%|███▌ | 2662/7378 [9:08:26<16:18:26, 12.45s/it] + +{'loss': 0.5124, 'learning_rate': 1.4790738342687729e-05, 'epoch': 0.36} + + 36%|███▌ | 2662/7378 [9:08:26<16:18:26, 12.45s/it] + 36%|███▌ | 2663/7378 [9:08:38<16:08:18, 12.32s/it] + +{'loss': 0.4948, 'learning_rate': 1.4786884317337875e-05, 'epoch': 0.36} + + 36%|███▌ | 2663/7378 [9:08:38<16:08:18, 12.32s/it] + 36%|███▌ | 2664/7378 [9:08:51<16:12:28, 12.38s/it] + +{'loss': 0.5125, 'learning_rate': 1.478302936939109e-05, 'epoch': 0.36} + + 36%|███▌ | 2664/7378 [9:08:51<16:12:28, 12.38s/it] + 36%|███▌ | 2665/7378 [9:09:03<16:07:35, 12.32s/it] + +{'loss': 0.4781, 'learning_rate': 1.4779173499590353e-05, 'epoch': 0.36} + + 36%|███▌ | 2665/7378 [9:09:03<16:07:35, 12.32s/it] + 36%|███▌ | 2666/7378 [9:09:16<16:11:21, 12.37s/it] + +{'loss': 0.4834, 'learning_rate': 1.4775316708678819e-05, 'epoch': 0.36} + + 36%|███▌ | 2666/7378 [9:09:16<16:11:21, 12.37s/it] + 36%|███▌ | 2667/7378 [9:09:28<16:14:48, 12.42s/it] + +{'loss': 0.4241, 'learning_rate': 1.477145899739983e-05, 'epoch': 0.36} + + 36%|███▌ | 2667/7378 [9:09:28<16:14:48, 12.42s/it] + 36%|███▌ | 2668/7378 [9:09:40<16:10:58, 12.37s/it] + +{'loss': 0.4918, 'learning_rate': 1.4767600366496893e-05, 'epoch': 0.36} + + 36%|███▌ | 2668/7378 [9:09:40<16:10:58, 12.37s/it] + 36%|███▌ | 2669/7378 [9:09:53<16:07:32, 12.33s/it] + +{'loss': 0.4199, 'learning_rate': 1.4763740816713704e-05, 'epoch': 0.36} + + 36%|███▌ | 2669/7378 [9:09:53<16:07:32, 12.33s/it] + 36%|███▌ | 2670/7378 [9:10:05<16:07:05, 12.32s/it] + +{'loss': 0.4366, 'learning_rate': 1.4759880348794127e-05, 'epoch': 0.36} + + 36%|███▌ | 2670/7378 [9:10:05<16:07:05, 12.32s/it] + 36%|███▌ | 2671/7378 [9:10:17<15:57:54, 12.21s/it] + +{'loss': 0.4694, 'learning_rate': 1.4756018963482208e-05, 'epoch': 0.36} + + 36%|███▌ | 2671/7378 [9:10:17<15:57:54, 12.21s/it] + 36%|███▌ | 2672/7378 [9:10:30<16:23:03, 12.53s/it] + +{'loss': 0.5261, 'learning_rate': 1.475215666152217e-05, 'epoch': 0.36} + + 36%|███▌ | 2672/7378 [9:10:30<16:23:03, 12.53s/it] + 36%|███▌ | 2673/7378 [9:10:42<16:13:04, 12.41s/it] + +{'loss': 0.4963, 'learning_rate': 1.474829344365841e-05, 'epoch': 0.36} + + 36%|███▌ | 2673/7378 [9:10:42<16:13:04, 12.41s/it] + 36%|███▌ | 2674/7378 [9:10:54<16:07:31, 12.34s/it] + +{'loss': 0.4393, 'learning_rate': 1.4744429310635501e-05, 'epoch': 0.36} + + 36%|███▌ | 2674/7378 [9:10:55<16:07:31, 12.34s/it] + 36%|███▋ | 2675/7378 [9:11:06<15:59:02, 12.24s/it] + +{'loss': 0.452, 'learning_rate': 1.4740564263198196e-05, 'epoch': 0.36} + + 36%|███▋ | 2675/7378 [9:11:06<15:59:02, 12.24s/it] + 36%|███▋ | 2676/7378 [9:11:20<16:17:13, 12.47s/it] + +{'loss': 0.5136, 'learning_rate': 1.4736698302091423e-05, 'epoch': 0.36} + + 36%|███▋ | 2676/7378 [9:11:20<16:17:13, 12.47s/it] + 36%|███▋ | 2677/7378 [9:11:32<16:12:43, 12.42s/it] + +{'loss': 0.4993, 'learning_rate': 1.473283142806028e-05, 'epoch': 0.36} + + 36%|███▋ | 2677/7378 [9:11:32<16:12:43, 12.42s/it] + 36%|███▋ | 2678/7378 [9:11:44<16:15:05, 12.45s/it] + +{'loss': 0.497, 'learning_rate': 1.4728963641850056e-05, 'epoch': 0.36} + + 36%|███▋ | 2678/7378 [9:11:44<16:15:05, 12.45s/it] + 36%|███▋ | 2679/7378 [9:11:57<16:17:47, 12.49s/it] + +{'loss': 0.5011, 'learning_rate': 1.47250949442062e-05, 'epoch': 0.36} + + 36%|███▋ | 2679/7378 [9:11:57<16:17:47, 12.49s/it] + 36%|███▋ | 2680/7378 [9:12:09<16:18:18, 12.49s/it] + +{'loss': 0.4156, 'learning_rate': 1.4721225335874343e-05, 'epoch': 0.36} + + 36%|███▋ | 2680/7378 [9:12:09<16:18:18, 12.49s/it] + 36%|███▋ | 2681/7378 [9:12:22<16:11:12, 12.41s/it] + +{'loss': 0.4272, 'learning_rate': 1.471735481760029e-05, 'epoch': 0.36} + + 36%|███▋ | 2681/7378 [9:12:22<16:11:12, 12.41s/it] + 36%|███▋ | 2682/7378 [9:12:34<16:03:09, 12.31s/it] + +{'loss': 0.4926, 'learning_rate': 1.4713483390130027e-05, 'epoch': 0.36} + + 36%|███▋ | 2682/7378 [9:12:34<16:03:09, 12.31s/it] + 36%|███▋ | 2683/7378 [9:12:46<16:07:23, 12.36s/it] + +{'loss': 0.5699, 'learning_rate': 1.4709611054209711e-05, 'epoch': 0.36} + + 36%|███▋ | 2683/7378 [9:12:46<16:07:23, 12.36s/it] + 36%|███▋ | 2684/7378 [9:12:58<16:05:53, 12.35s/it] + +{'loss': 0.4788, 'learning_rate': 1.4705737810585667e-05, 'epoch': 0.36} + + 36%|███▋ | 2684/7378 [9:12:58<16:05:53, 12.35s/it] + 36%|███▋ | 2685/7378 [9:13:11<16:15:08, 12.47s/it] + +{'loss': 0.4655, 'learning_rate': 1.4701863660004411e-05, 'epoch': 0.36} + + 36%|███▋ | 2685/7378 [9:13:11<16:15:08, 12.47s/it] + 36%|███▋ | 2686/7378 [9:13:23<16:05:55, 12.35s/it] + +{'loss': 0.3968, 'learning_rate': 1.4697988603212619e-05, 'epoch': 0.36} + + 36%|███▋ | 2686/7378 [9:13:23<16:05:55, 12.35s/it] + 36%|███▋ | 2687/7378 [9:13:36<16:02:15, 12.31s/it] + +{'loss': 0.4818, 'learning_rate': 1.469411264095715e-05, 'epoch': 0.36} + + 36%|███▋ | 2687/7378 [9:13:36<16:02:15, 12.31s/it] + 36%|███▋ | 2688/7378 [9:13:48<15:55:48, 12.23s/it] + +{'loss': 0.4318, 'learning_rate': 1.4690235773985035e-05, 'epoch': 0.36} + + 36%|███▋ | 2688/7378 [9:13:48<15:55:48, 12.23s/it] + 36%|███▋ | 2689/7378 [9:14:00<15:58:45, 12.27s/it] + +{'loss': 0.4541, 'learning_rate': 1.4686358003043476e-05, 'epoch': 0.36} + + 36%|███▋ | 2689/7378 [9:14:00<15:58:45, 12.27s/it] + 36%|███▋ | 2690/7378 [9:14:12<15:46:55, 12.12s/it] + +{'loss': 0.4674, 'learning_rate': 1.4682479328879857e-05, 'epoch': 0.36} + + 36%|███▋ | 2690/7378 [9:14:12<15:46:55, 12.12s/it] + 36%|███▋ | 2691/7378 [9:14:24<15:47:49, 12.13s/it] + +{'loss': 0.4158, 'learning_rate': 1.4678599752241728e-05, 'epoch': 0.36} + + 36%|███▋ | 2691/7378 [9:14:24<15:47:49, 12.13s/it] + 36%|███▋ | 2692/7378 [9:14:36<15:42:29, 12.07s/it] + +{'loss': 0.4451, 'learning_rate': 1.467471927387682e-05, 'epoch': 0.36} + + 36%|███▋ | 2692/7378 [9:14:36<15:42:29, 12.07s/it] + 37%|███▋ | 2693/7378 [9:14:48<15:42:56, 12.08s/it] + +{'loss': 0.4949, 'learning_rate': 1.4670837894533032e-05, 'epoch': 0.37} + + 37%|███▋ | 2693/7378 [9:14:48<15:42:56, 12.08s/it] + 37%|███▋ | 2694/7378 [9:15:01<15:56:58, 12.26s/it] + +{'loss': 0.4465, 'learning_rate': 1.466695561495844e-05, 'epoch': 0.37} + + 37%|███▋ | 2694/7378 [9:15:01<15:56:58, 12.26s/it] + 37%|███▋ | 2695/7378 [9:15:13<15:51:53, 12.20s/it] + +{'loss': 0.4806, 'learning_rate': 1.4663072435901293e-05, 'epoch': 0.37} + + 37%|███▋ | 2695/7378 [9:15:13<15:51:53, 12.20s/it] + 37%|███▋ | 2696/7378 [9:15:25<15:54:18, 12.23s/it] + +{'loss': 0.455, 'learning_rate': 1.465918835811001e-05, 'epoch': 0.37} + + 37%|███▋ | 2696/7378 [9:15:25<15:54:18, 12.23s/it] + 37%|███▋ | 2697/7378 [9:15:37<15:43:25, 12.09s/it] + +{'loss': 0.48, 'learning_rate': 1.465530338233319e-05, 'epoch': 0.37} + + 37%|███▋ | 2697/7378 [9:15:37<15:43:25, 12.09s/it] + 37%|███▋ | 2698/7378 [9:15:49<15:41:32, 12.07s/it] + +{'loss': 0.537, 'learning_rate': 1.4651417509319598e-05, 'epoch': 0.37} + + 37%|███▋ | 2698/7378 [9:15:49<15:41:32, 12.07s/it] + 37%|███▋ | 2699/7378 [9:16:01<15:45:53, 12.13s/it] + +{'loss': 0.4244, 'learning_rate': 1.4647530739818179e-05, 'epoch': 0.37} + + 37%|███▋ | 2699/7378 [9:16:01<15:45:53, 12.13s/it] + 37%|███▋ | 2700/7378 [9:16:13<15:53:57, 12.24s/it] + +{'loss': 0.5066, 'learning_rate': 1.4643643074578045e-05, 'epoch': 0.37} + + 37%|███▋ | 2700/7378 [9:16:13<15:53:57, 12.24s/it] + 37%|███▋ | 2701/7378 [9:16:25<15:48:27, 12.17s/it] + +{'loss': 0.4825, 'learning_rate': 1.463975451434848e-05, 'epoch': 0.37} + + 37%|███▋ | 2701/7378 [9:16:25<15:48:27, 12.17s/it] + 37%|███▋ | 2702/7378 [9:16:38<16:02:58, 12.36s/it] + +{'loss': 0.3993, 'learning_rate': 1.4635865059878947e-05, 'epoch': 0.37} + + 37%|███▋ | 2702/7378 [9:16:38<16:02:58, 12.36s/it] + 37%|███▋ | 2703/7378 [9:16:51<16:01:01, 12.33s/it] + +{'loss': 0.46, 'learning_rate': 1.463197471191907e-05, 'epoch': 0.37} + + 37%|███▋ | 2703/7378 [9:16:51<16:01:01, 12.33s/it] + 37%|███▋ | 2704/7378 [9:17:03<16:03:19, 12.37s/it] + +{'loss': 0.4211, 'learning_rate': 1.4628083471218664e-05, 'epoch': 0.37} + + 37%|███▋ | 2704/7378 [9:17:03<16:03:19, 12.37s/it] + 37%|███▋ | 2705/7378 [9:17:15<15:52:57, 12.24s/it] + +{'loss': 0.4544, 'learning_rate': 1.4624191338527698e-05, 'epoch': 0.37} + + 37%|███▋ | 2705/7378 [9:17:15<15:52:57, 12.24s/it] + 37%|███▋ | 2706/7378 [9:17:27<15:47:26, 12.17s/it] + +{'loss': 0.4179, 'learning_rate': 1.462029831459632e-05, 'epoch': 0.37} + + 37%|███▋ | 2706/7378 [9:17:27<15:47:26, 12.17s/it] + 37%|███▋ | 2707/7378 [9:17:39<15:43:32, 12.12s/it] + +{'loss': 0.5464, 'learning_rate': 1.4616404400174848e-05, 'epoch': 0.37} + + 37%|███▋ | 2707/7378 [9:17:39<15:43:32, 12.12s/it] + 37%|███▋ | 2708/7378 [9:17:51<15:46:31, 12.16s/it] + +{'loss': 0.447, 'learning_rate': 1.4612509596013777e-05, 'epoch': 0.37} + + 37%|███▋ | 2708/7378 [9:17:51<15:46:31, 12.16s/it] + 37%|███▋ | 2709/7378 [9:18:04<15:56:12, 12.29s/it] + +{'loss': 0.355, 'learning_rate': 1.4608613902863769e-05, 'epoch': 0.37} + + 37%|███▋ | 2709/7378 [9:18:04<15:56:12, 12.29s/it] + 37%|███▋ | 2710/7378 [9:18:16<16:04:05, 12.39s/it] + +{'loss': 0.4901, 'learning_rate': 1.4604717321475652e-05, 'epoch': 0.37} + + 37%|███▋ | 2710/7378 [9:18:16<16:04:05, 12.39s/it] + 37%|███▋ | 2711/7378 [9:18:28<15:56:52, 12.30s/it] + +{'loss': 0.4747, 'learning_rate': 1.4600819852600437e-05, 'epoch': 0.37} + + 37%|███▋ | 2711/7378 [9:18:29<15:56:52, 12.30s/it] + 37%|███▋ | 2712/7378 [9:18:41<15:50:35, 12.22s/it] + +{'loss': 0.4583, 'learning_rate': 1.4596921496989297e-05, 'epoch': 0.37} + + 37%|███▋ | 2712/7378 [9:18:41<15:50:35, 12.22s/it] + 37%|███▋ | 2713/7378 [9:18:53<15:55:28, 12.29s/it] + +{'loss': 0.5075, 'learning_rate': 1.459302225539358e-05, 'epoch': 0.37} + + 37%|███▋ | 2713/7378 [9:18:53<15:55:28, 12.29s/it] + 37%|███▋ | 2714/7378 [9:19:05<15:55:58, 12.30s/it] + +{'loss': 0.5558, 'learning_rate': 1.4589122128564806e-05, 'epoch': 0.37} + + 37%|███▋ | 2714/7378 [9:19:05<15:55:58, 12.30s/it] + 37%|███▋ | 2715/7378 [9:19:18<15:55:01, 12.29s/it] + +{'loss': 0.4627, 'learning_rate': 1.458522111725466e-05, 'epoch': 0.37} + + 37%|███▋ | 2715/7378 [9:19:18<15:55:01, 12.29s/it] + 37%|███▋ | 2716/7378 [9:19:30<15:51:39, 12.25s/it] + +{'loss': 0.4825, 'learning_rate': 1.4581319222215e-05, 'epoch': 0.37} + + 37%|███▋ | 2716/7378 [9:19:30<15:51:39, 12.25s/it] + 37%|███▋ | 2717/7378 [9:19:42<15:50:36, 12.24s/it] + +{'loss': 0.4992, 'learning_rate': 1.4577416444197858e-05, 'epoch': 0.37} + + 37%|███▋ | 2717/7378 [9:19:42<15:50:36, 12.24s/it] + 37%|███▋ | 2718/7378 [9:19:54<15:49:03, 12.22s/it] + +{'loss': 0.512, 'learning_rate': 1.4573512783955435e-05, 'epoch': 0.37} + + 37%|███▋ | 2718/7378 [9:19:54<15:49:03, 12.22s/it] + 37%|███▋ | 2719/7378 [9:20:06<15:52:50, 12.27s/it] + +{'loss': 0.4481, 'learning_rate': 1.4569608242240092e-05, 'epoch': 0.37} + + 37%|███▋ | 2719/7378 [9:20:07<15:52:50, 12.27s/it] + 37%|███▋ | 2720/7378 [9:20:19<15:51:19, 12.25s/it] + +{'loss': 0.4289, 'learning_rate': 1.456570281980438e-05, 'epoch': 0.37} + + 37%|███▋ | 2720/7378 [9:20:19<15:51:19, 12.25s/it] + 37%|███▋ | 2721/7378 [9:20:31<15:58:31, 12.35s/it] + +{'loss': 0.4851, 'learning_rate': 1.4561796517400996e-05, 'epoch': 0.37} + + 37%|███▋ | 2721/7378 [9:20:31<15:58:31, 12.35s/it] + 37%|███▋ | 2722/7378 [9:20:44<16:02:39, 12.41s/it] + +{'loss': 0.4328, 'learning_rate': 1.4557889335782827e-05, 'epoch': 0.37} + + 37%|███▋ | 2722/7378 [9:20:44<16:02:39, 12.41s/it] + 37%|███▋ | 2723/7378 [9:20:56<16:05:33, 12.45s/it] + +{'loss': 0.4693, 'learning_rate': 1.455398127570292e-05, 'epoch': 0.37} + + 37%|███▋ | 2723/7378 [9:20:56<16:05:33, 12.45s/it] + 37%|███▋ | 2724/7378 [9:21:09<16:03:30, 12.42s/it] + +{'loss': 0.5475, 'learning_rate': 1.4550072337914487e-05, 'epoch': 0.37} + + 37%|███▋ | 2724/7378 [9:21:09<16:03:30, 12.42s/it] + 37%|███▋ | 2725/7378 [9:21:21<16:10:10, 12.51s/it] + +{'loss': 0.4738, 'learning_rate': 1.4546162523170922e-05, 'epoch': 0.37} + + 37%|███▋ | 2725/7378 [9:21:21<16:10:10, 12.51s/it] + 37%|███▋ | 2726/7378 [9:21:34<16:09:35, 12.51s/it] + +{'loss': 0.4538, 'learning_rate': 1.4542251832225774e-05, 'epoch': 0.37} + + 37%|███▋ | 2726/7378 [9:21:34<16:09:35, 12.51s/it] + 37%|███▋ | 2727/7378 [9:21:46<16:00:44, 12.39s/it] + +{'loss': 0.354, 'learning_rate': 1.4538340265832772e-05, 'epoch': 0.37} + + 37%|███▋ | 2727/7378 [9:21:46<16:00:44, 12.39s/it] + 37%|███▋ | 2728/7378 [9:21:59<16:10:39, 12.52s/it] + +{'loss': 0.4755, 'learning_rate': 1.4534427824745804e-05, 'epoch': 0.37} + + 37%|███▋ | 2728/7378 [9:21:59<16:10:39, 12.52s/it] + 37%|███▋ | 2729/7378 [9:22:11<16:09:27, 12.51s/it] + +{'loss': 0.4081, 'learning_rate': 1.4530514509718938e-05, 'epoch': 0.37} + + 37%|███▋ | 2729/7378 [9:22:11<16:09:27, 12.51s/it] + 37%|███▋ | 2730/7378 [9:22:24<16:12:21, 12.55s/it] + +{'loss': 0.4897, 'learning_rate': 1.4526600321506403e-05, 'epoch': 0.37} + + 37%|███▋ | 2730/7378 [9:22:24<16:12:21, 12.55s/it] + 37%|███▋ | 2731/7378 [9:22:37<16:15:01, 12.59s/it] + +{'loss': 0.5145, 'learning_rate': 1.4522685260862593e-05, 'epoch': 0.37} + + 37%|███▋ | 2731/7378 [9:22:37<16:15:01, 12.59s/it] + 37%|███▋ | 2732/7378 [9:22:49<16:19:16, 12.65s/it] + +{'loss': 0.4923, 'learning_rate': 1.4518769328542077e-05, 'epoch': 0.37} + + 37%|███▋ | 2732/7378 [9:22:49<16:19:16, 12.65s/it] + 37%|███▋ | 2733/7378 [9:23:02<16:07:29, 12.50s/it] + +{'loss': 0.4422, 'learning_rate': 1.4514852525299587e-05, 'epoch': 0.37} + + 37%|███▋ | 2733/7378 [9:23:02<16:07:29, 12.50s/it] + 37%|███▋ | 2734/7378 [9:23:14<16:08:57, 12.52s/it] + +{'loss': 0.4614, 'learning_rate': 1.4510934851890032e-05, 'epoch': 0.37} + + 37%|███▋ | 2734/7378 [9:23:14<16:08:57, 12.52s/it] + 37%|███▋ | 2735/7378 [9:23:27<16:13:58, 12.59s/it] + +{'loss': 0.4784, 'learning_rate': 1.4507016309068477e-05, 'epoch': 0.37} + + 37%|███▋ | 2735/7378 [9:23:27<16:13:58, 12.59s/it] + 37%|███▋ | 2736/7378 [9:23:39<16:12:49, 12.57s/it] + +{'loss': 0.4328, 'learning_rate': 1.4503096897590161e-05, 'epoch': 0.37} + + 37%|███▋ | 2736/7378 [9:23:40<16:12:49, 12.57s/it] + 37%|███▋ | 2737/7378 [9:23:52<16:08:49, 12.53s/it] + +{'loss': 0.4393, 'learning_rate': 1.4499176618210489e-05, 'epoch': 0.37} + + 37%|███▋ | 2737/7378 [9:23:52<16:08:49, 12.53s/it] + 37%|███▋ | 2738/7378 [9:24:04<16:04:04, 12.47s/it] + +{'loss': 0.4797, 'learning_rate': 1.4495255471685035e-05, 'epoch': 0.37} + + 37%|███▋ | 2738/7378 [9:24:04<16:04:04, 12.47s/it] + 37%|███▋ | 2739/7378 [9:24:16<15:50:34, 12.29s/it] + +{'loss': 0.4254, 'learning_rate': 1.4491333458769536e-05, 'epoch': 0.37} + + 37%|███▋ | 2739/7378 [9:24:16<15:50:34, 12.29s/it] + 37%|███▋ | 2740/7378 [9:24:29<15:58:40, 12.40s/it] + +{'loss': 0.4605, 'learning_rate': 1.44874105802199e-05, 'epoch': 0.37} + + 37%|███▋ | 2740/7378 [9:24:29<15:58:40, 12.40s/it] + 37%|███▋ | 2741/7378 [9:24:41<15:48:40, 12.28s/it] + +{'loss': 0.4395, 'learning_rate': 1.44834868367922e-05, 'epoch': 0.37} + + 37%|███▋ | 2741/7378 [9:24:41<15:48:40, 12.28s/it] + 37%|███▋ | 2742/7378 [9:24:53<15:53:20, 12.34s/it] + +{'loss': 0.4223, 'learning_rate': 1.4479562229242671e-05, 'epoch': 0.37} + + 37%|███▋ | 2742/7378 [9:24:53<15:53:20, 12.34s/it] + 37%|███▋ | 2743/7378 [9:25:06<15:57:34, 12.40s/it] + +{'loss': 0.4881, 'learning_rate': 1.4475636758327731e-05, 'epoch': 0.37} + + 37%|███▋ | 2743/7378 [9:25:06<15:57:34, 12.40s/it] + 37%|███▋ | 2744/7378 [9:25:18<15:50:45, 12.31s/it] + +{'loss': 0.5159, 'learning_rate': 1.4471710424803948e-05, 'epoch': 0.37} + + 37%|███▋ | 2744/7378 [9:25:18<15:50:45, 12.31s/it] + 37%|███▋ | 2745/7378 [9:25:30<15:38:59, 12.16s/it] + +{'loss': 0.4202, 'learning_rate': 1.4467783229428056e-05, 'epoch': 0.37} + + 37%|███▋ | 2745/7378 [9:25:30<15:38:59, 12.16s/it] + 37%|███▋ | 2746/7378 [9:25:42<15:40:56, 12.19s/it] + +{'loss': 0.47, 'learning_rate': 1.4463855172956964e-05, 'epoch': 0.37} + + 37%|███▋ | 2746/7378 [9:25:42<15:40:56, 12.19s/it] + 37%|███▋ | 2747/7378 [9:25:54<15:46:22, 12.26s/it] + +{'loss': 0.4699, 'learning_rate': 1.4459926256147745e-05, 'epoch': 0.37} + + 37%|███▋ | 2747/7378 [9:25:54<15:46:22, 12.26s/it] + 37%|███▋ | 2748/7378 [9:26:07<15:51:03, 12.32s/it] + +{'loss': 0.4879, 'learning_rate': 1.4455996479757634e-05, 'epoch': 0.37} + + 37%|███▋ | 2748/7378 [9:26:07<15:51:03, 12.32s/it] + 37%|███▋ | 2749/7378 [9:26:19<15:51:12, 12.33s/it] + +{'loss': 0.5022, 'learning_rate': 1.4452065844544033e-05, 'epoch': 0.37} + + 37%|███▋ | 2749/7378 [9:26:19<15:51:12, 12.33s/it] + 37%|███▋ | 2750/7378 [9:26:32<15:53:02, 12.36s/it] + +{'loss': 0.3793, 'learning_rate': 1.4448134351264519e-05, 'epoch': 0.37} + + 37%|███▋ | 2750/7378 [9:26:32<15:53:02, 12.36s/it] + 37%|███▋ | 2751/7378 [9:26:44<15:54:23, 12.38s/it] + +{'loss': 0.4376, 'learning_rate': 1.4444202000676812e-05, 'epoch': 0.37} + + 37%|███▋ | 2751/7378 [9:26:44<15:54:23, 12.38s/it] + 37%|███▋ | 2752/7378 [9:26:56<15:49:42, 12.32s/it] + +{'loss': 0.503, 'learning_rate': 1.4440268793538819e-05, 'epoch': 0.37} + + 37%|███▋ | 2752/7378 [9:26:56<15:49:42, 12.32s/it] + 37%|███▋ | 2753/7378 [9:27:09<16:03:19, 12.50s/it] + +{'loss': 0.5111, 'learning_rate': 1.4436334730608603e-05, 'epoch': 0.37} + + 37%|███▋ | 2753/7378 [9:27:09<16:03:19, 12.50s/it] + 37%|███▋ | 2754/7378 [9:27:21<15:56:37, 12.41s/it] + +{'loss': 0.4074, 'learning_rate': 1.4432399812644394e-05, 'epoch': 0.37} + + 37%|███▋ | 2754/7378 [9:27:21<15:56:37, 12.41s/it] + 37%|███▋ | 2755/7378 [9:27:34<15:56:15, 12.41s/it] + +{'loss': 0.4208, 'learning_rate': 1.4428464040404582e-05, 'epoch': 0.37} + + 37%|███▋ | 2755/7378 [9:27:34<15:56:15, 12.41s/it] + 37%|███▋ | 2756/7378 [9:27:46<16:01:59, 12.49s/it] + +{'loss': 0.5505, 'learning_rate': 1.4424527414647726e-05, 'epoch': 0.37} + + 37%|███▋ | 2756/7378 [9:27:46<16:01:59, 12.49s/it] + 37%|███▋ | 2757/7378 [9:27:59<15:56:33, 12.42s/it] + +{'loss': 0.4145, 'learning_rate': 1.4420589936132553e-05, 'epoch': 0.37} + + 37%|███▋ | 2757/7378 [9:27:59<15:56:33, 12.42s/it] + 37%|███▋ | 2758/7378 [9:28:11<15:55:31, 12.41s/it] + +{'loss': 0.4807, 'learning_rate': 1.4416651605617949e-05, 'epoch': 0.37} + + 37%|███▋ | 2758/7378 [9:28:11<15:55:31, 12.41s/it] + 37%|███▋ | 2759/7378 [9:28:23<15:49:24, 12.33s/it] + +{'loss': 0.4949, 'learning_rate': 1.4412712423862964e-05, 'epoch': 0.37} + + 37%|███▋ | 2759/7378 [9:28:23<15:49:24, 12.33s/it] + 37%|███▋ | 2760/7378 [9:28:35<15:46:24, 12.30s/it] + +{'loss': 0.4811, 'learning_rate': 1.4408772391626813e-05, 'epoch': 0.37} + + 37%|███▋ | 2760/7378 [9:28:35<15:46:24, 12.30s/it] + 37%|███▋ | 2761/7378 [9:28:48<15:44:55, 12.28s/it] + +{'loss': 0.48, 'learning_rate': 1.4404831509668877e-05, 'epoch': 0.37} + + 37%|███▋ | 2761/7378 [9:28:48<15:44:55, 12.28s/it] + 37%|███▋ | 2762/7378 [9:29:00<15:47:35, 12.32s/it] + +{'loss': 0.5044, 'learning_rate': 1.44008897787487e-05, 'epoch': 0.37} + + 37%|███▋ | 2762/7378 [9:29:00<15:47:35, 12.32s/it] + 37%|███▋ | 2763/7378 [9:29:13<15:51:41, 12.37s/it] + +{'loss': 0.5243, 'learning_rate': 1.4396947199625984e-05, 'epoch': 0.37} + + 37%|███▋ | 2763/7378 [9:29:13<15:51:41, 12.37s/it] + 37%|███▋ | 2764/7378 [9:29:25<15:41:46, 12.25s/it] + +{'loss': 0.4434, 'learning_rate': 1.4393003773060605e-05, 'epoch': 0.37} + + 37%|███▋ | 2764/7378 [9:29:25<15:41:46, 12.25s/it] + 37%|███▋ | 2765/7378 [9:29:36<15:34:20, 12.15s/it] + +{'loss': 0.4376, 'learning_rate': 1.4389059499812594e-05, 'epoch': 0.37} + + 37%|███▋ | 2765/7378 [9:29:37<15:34:20, 12.15s/it] + 37%|███▋ | 2766/7378 [9:29:49<15:38:30, 12.21s/it] + +{'loss': 0.4648, 'learning_rate': 1.4385114380642148e-05, 'epoch': 0.37} + + 37%|███▋ | 2766/7378 [9:29:49<15:38:30, 12.21s/it] + 38%|███▊ | 2767/7378 [9:30:01<15:37:46, 12.20s/it] + +{'loss': 0.4661, 'learning_rate': 1.438116841630963e-05, 'epoch': 0.38} + + 38%|███▊ | 2767/7378 [9:30:01<15:37:46, 12.20s/it] + 38%|███▊ | 2768/7378 [9:30:13<15:33:08, 12.14s/it] + +{'loss': 0.3945, 'learning_rate': 1.4377221607575558e-05, 'epoch': 0.38} + + 38%|███▊ | 2768/7378 [9:30:13<15:33:08, 12.14s/it] + 38%|███▊ | 2769/7378 [9:30:25<15:37:47, 12.21s/it] + +{'loss': 0.4432, 'learning_rate': 1.437327395520062e-05, 'epoch': 0.38} + + 38%|███▊ | 2769/7378 [9:30:25<15:37:47, 12.21s/it] + 38%|███▊ | 2770/7378 [9:30:38<15:41:49, 12.26s/it] + +{'loss': 0.4532, 'learning_rate': 1.4369325459945661e-05, 'epoch': 0.38} + + 38%|███▊ | 2770/7378 [9:30:38<15:41:49, 12.26s/it] + 38%|███▊ | 2771/7378 [9:30:50<15:30:36, 12.12s/it] + +{'loss': 0.4819, 'learning_rate': 1.4365376122571699e-05, 'epoch': 0.38} + + 38%|███▊ | 2771/7378 [9:30:50<15:30:36, 12.12s/it] + 38%|███▊ | 2772/7378 [9:31:03<15:51:20, 12.39s/it] + +{'loss': 0.4928, 'learning_rate': 1.43614259438399e-05, 'epoch': 0.38} + + 38%|███▊ | 2772/7378 [9:31:03<15:51:20, 12.39s/it] + 38%|███▊ | 2773/7378 [9:31:15<15:52:38, 12.41s/it] + +{'loss': 0.4781, 'learning_rate': 1.4357474924511601e-05, 'epoch': 0.38} + + 38%|███▊ | 2773/7378 [9:31:15<15:52:38, 12.41s/it] + 38%|███▊ | 2774/7378 [9:31:27<15:46:56, 12.34s/it] + +{'loss': 0.5256, 'learning_rate': 1.43535230653483e-05, 'epoch': 0.38} + + 38%|███▊ | 2774/7378 [9:31:27<15:46:56, 12.34s/it] + 38%|███▊ | 2775/7378 [9:31:40<15:51:37, 12.40s/it] + +{'loss': 0.4768, 'learning_rate': 1.4349570367111655e-05, 'epoch': 0.38} + + 38%|███▊ | 2775/7378 [9:31:40<15:51:37, 12.40s/it] + 38%|███▊ | 2776/7378 [9:31:52<15:46:50, 12.34s/it] + +{'loss': 0.471, 'learning_rate': 1.4345616830563488e-05, 'epoch': 0.38} + + 38%|███▊ | 2776/7378 [9:31:52<15:46:50, 12.34s/it] + 38%|███▊ | 2777/7378 [9:32:04<15:46:47, 12.35s/it] + +{'loss': 0.458, 'learning_rate': 1.4341662456465777e-05, 'epoch': 0.38} + + 38%|███▊ | 2777/7378 [9:32:04<15:46:47, 12.35s/it] + 38%|███▊ | 2778/7378 [9:32:17<15:52:37, 12.43s/it] + +{'loss': 0.5276, 'learning_rate': 1.4337707245580675e-05, 'epoch': 0.38} + + 38%|███▊ | 2778/7378 [9:32:17<15:52:37, 12.43s/it] + 38%|███▊ | 2779/7378 [9:32:29<15:42:36, 12.30s/it] + +{'loss': 0.4881, 'learning_rate': 1.4333751198670474e-05, 'epoch': 0.38} + + 38%|███▊ | 2779/7378 [9:32:29<15:42:36, 12.30s/it] + 38%|███▊ | 2780/7378 [9:32:41<15:35:47, 12.21s/it] + +{'loss': 0.4724, 'learning_rate': 1.432979431649765e-05, 'epoch': 0.38} + + 38%|███▊ | 2780/7378 [9:32:41<15:35:47, 12.21s/it] + 38%|███▊ | 2781/7378 [9:32:53<15:40:02, 12.27s/it] + +{'loss': 0.4988, 'learning_rate': 1.4325836599824828e-05, 'epoch': 0.38} + + 38%|███▊ | 2781/7378 [9:32:53<15:40:02, 12.27s/it] + 38%|███▊ | 2782/7378 [9:33:06<15:40:46, 12.28s/it] + +{'loss': 0.4655, 'learning_rate': 1.4321878049414788e-05, 'epoch': 0.38} + + 38%|███▊ | 2782/7378 [9:33:06<15:40:46, 12.28s/it] + 38%|███▊ | 2783/7378 [9:33:18<15:43:10, 12.32s/it] + +{'loss': 0.46, 'learning_rate': 1.4317918666030492e-05, 'epoch': 0.38} + + 38%|███▊ | 2783/7378 [9:33:18<15:43:10, 12.32s/it] + 38%|███▊ | 2784/7378 [9:33:30<15:38:04, 12.25s/it] + +{'loss': 0.5329, 'learning_rate': 1.4313958450435039e-05, 'epoch': 0.38} + + 38%|███▊ | 2784/7378 [9:33:30<15:38:04, 12.25s/it] + 38%|███▊ | 2785/7378 [9:33:43<15:40:29, 12.29s/it] + +{'loss': 0.508, 'learning_rate': 1.4309997403391703e-05, 'epoch': 0.38} + + 38%|███▊ | 2785/7378 [9:33:43<15:40:29, 12.29s/it] + 38%|███▊ | 2786/7378 [9:33:55<15:46:24, 12.37s/it] + +{'loss': 0.4874, 'learning_rate': 1.4306035525663911e-05, 'epoch': 0.38} + + 38%|███▊ | 2786/7378 [9:33:55<15:46:24, 12.37s/it] + 38%|███▊ | 2787/7378 [9:34:07<15:41:05, 12.30s/it] + +{'loss': 0.4945, 'learning_rate': 1.4302072818015253e-05, 'epoch': 0.38} + + 38%|███▊ | 2787/7378 [9:34:07<15:41:05, 12.30s/it] + 38%|███▊ | 2788/7378 [9:34:19<15:37:05, 12.25s/it] + +{'loss': 0.4825, 'learning_rate': 1.4298109281209484e-05, 'epoch': 0.38} + + 38%|███▊ | 2788/7378 [9:34:19<15:37:05, 12.25s/it] + 38%|███▊ | 2789/7378 [9:34:32<15:42:04, 12.32s/it] + +{'loss': 0.4628, 'learning_rate': 1.4294144916010506e-05, 'epoch': 0.38} + + 38%|███▊ | 2789/7378 [9:34:32<15:42:04, 12.32s/it] + 38%|███▊ | 2790/7378 [9:34:44<15:42:49, 12.33s/it] + +{'loss': 0.4461, 'learning_rate': 1.4290179723182395e-05, 'epoch': 0.38} + + 38%|███▊ | 2790/7378 [9:34:44<15:42:49, 12.33s/it] + 38%|███▊ | 2791/7378 [9:34:57<15:49:37, 12.42s/it] + +{'loss': 0.4866, 'learning_rate': 1.4286213703489375e-05, 'epoch': 0.38} + + 38%|███▊ | 2791/7378 [9:34:57<15:49:37, 12.42s/it] + 38%|███▊ | 2792/7378 [9:35:09<15:44:36, 12.36s/it] + +{'loss': 0.5438, 'learning_rate': 1.4282246857695838e-05, 'epoch': 0.38} + + 38%|███▊ | 2792/7378 [9:35:09<15:44:36, 12.36s/it] + 38%|███▊ | 2793/7378 [9:35:21<15:41:44, 12.32s/it] + +{'loss': 0.5026, 'learning_rate': 1.4278279186566326e-05, 'epoch': 0.38} + + 38%|███▊ | 2793/7378 [9:35:21<15:41:44, 12.32s/it] + 38%|███▊ | 2794/7378 [9:35:33<15:33:01, 12.21s/it] + +{'loss': 0.4445, 'learning_rate': 1.4274310690865551e-05, 'epoch': 0.38} + + 38%|███▊ | 2794/7378 [9:35:33<15:33:01, 12.21s/it] + 38%|███▊ | 2795/7378 [9:35:46<15:36:17, 12.26s/it] + +{'loss': 0.4785, 'learning_rate': 1.4270341371358379e-05, 'epoch': 0.38} + + 38%|███▊ | 2795/7378 [9:35:46<15:36:17, 12.26s/it] + 38%|███▊ | 2796/7378 [9:35:58<15:36:35, 12.26s/it] + +{'loss': 0.5097, 'learning_rate': 1.4266371228809825e-05, 'epoch': 0.38} + + 38%|███▊ | 2796/7378 [9:35:58<15:36:35, 12.26s/it] + 38%|███▊ | 2797/7378 [9:36:10<15:34:55, 12.25s/it] + +{'loss': 0.4939, 'learning_rate': 1.4262400263985083e-05, 'epoch': 0.38} + + 38%|███▊ | 2797/7378 [9:36:10<15:34:55, 12.25s/it] + 38%|███▊ | 2798/7378 [9:36:22<15:33:49, 12.23s/it] + +{'loss': 0.466, 'learning_rate': 1.4258428477649484e-05, 'epoch': 0.38} + + 38%|███▊ | 2798/7378 [9:36:22<15:33:49, 12.23s/it] + 38%|███▊ | 2799/7378 [9:36:34<15:26:52, 12.15s/it] + +{'loss': 0.3925, 'learning_rate': 1.4254455870568538e-05, 'epoch': 0.38} + + 38%|███▊ | 2799/7378 [9:36:34<15:26:52, 12.15s/it] + 38%|███▊ | 2800/7378 [9:36:46<15:29:16, 12.18s/it] + +{'loss': 0.477, 'learning_rate': 1.4250482443507894e-05, 'epoch': 0.38} + + 38%|███▊ | 2800/7378 [9:36:46<15:29:16, 12.18s/it] + 38%|███▊ | 2801/7378 [9:36:59<15:27:01, 12.15s/it] + +{'loss': 0.5143, 'learning_rate': 1.4246508197233374e-05, 'epoch': 0.38} + + 38%|███▊ | 2801/7378 [9:36:59<15:27:01, 12.15s/it] + 38%|███▊ | 2802/7378 [9:37:11<15:38:58, 12.31s/it] + +{'loss': 0.4372, 'learning_rate': 1.4242533132510945e-05, 'epoch': 0.38} + + 38%|███▊ | 2802/7378 [9:37:11<15:38:58, 12.31s/it] + 38%|███▊ | 2803/7378 [9:37:24<15:47:17, 12.42s/it] + +{'loss': 0.5278, 'learning_rate': 1.4238557250106744e-05, 'epoch': 0.38} + + 38%|███▊ | 2803/7378 [9:37:24<15:47:17, 12.42s/it] + 38%|███▊ | 2804/7378 [9:37:36<15:39:50, 12.33s/it] + +{'loss': 0.3915, 'learning_rate': 1.423458055078706e-05, 'epoch': 0.38} + + 38%|███▊ | 2804/7378 [9:37:36<15:39:50, 12.33s/it] + 38%|███▊ | 2805/7378 [9:37:48<15:40:51, 12.34s/it] + +{'loss': 0.4639, 'learning_rate': 1.4230603035318339e-05, 'epoch': 0.38} + + 38%|███▊ | 2805/7378 [9:37:48<15:40:51, 12.34s/it] + 38%|███▊ | 2806/7378 [9:38:01<15:39:34, 12.33s/it] + +{'loss': 0.4674, 'learning_rate': 1.422662470446718e-05, 'epoch': 0.38} + + 38%|███▊ | 2806/7378 [9:38:01<15:39:34, 12.33s/it] + 38%|███▊ | 2807/7378 [9:38:13<15:40:06, 12.34s/it] + +{'loss': 0.4671, 'learning_rate': 1.4222645559000347e-05, 'epoch': 0.38} + + 38%|███▊ | 2807/7378 [9:38:13<15:40:06, 12.34s/it] + 38%|███▊ | 2808/7378 [9:38:25<15:38:50, 12.33s/it] + +{'loss': 0.5305, 'learning_rate': 1.4218665599684762e-05, 'epoch': 0.38} + + 38%|███▊ | 2808/7378 [9:38:25<15:38:50, 12.33s/it] + 38%|███▊ | 2809/7378 [9:38:38<15:43:45, 12.39s/it] + +{'loss': 0.5099, 'learning_rate': 1.4214684827287495e-05, 'epoch': 0.38} + + 38%|███▊ | 2809/7378 [9:38:38<15:43:45, 12.39s/it] + 38%|███▊ | 2810/7378 [9:38:51<15:50:26, 12.48s/it] + +{'loss': 0.3997, 'learning_rate': 1.4210703242575778e-05, 'epoch': 0.38} + + 38%|███▊ | 2810/7378 [9:38:51<15:50:26, 12.48s/it] + 38%|███▊ | 2811/7378 [9:39:03<15:42:45, 12.39s/it] + +{'loss': 0.5311, 'learning_rate': 1.4206720846317002e-05, 'epoch': 0.38} + + 38%|███▊ | 2811/7378 [9:39:03<15:42:45, 12.39s/it] + 38%|███▊ | 2812/7378 [9:39:15<15:35:41, 12.30s/it] + +{'loss': 0.4774, 'learning_rate': 1.4202737639278708e-05, 'epoch': 0.38} + + 38%|███▊ | 2812/7378 [9:39:15<15:35:41, 12.30s/it] + 38%|███▊ | 2813/7378 [9:39:27<15:29:08, 12.21s/it] + +{'loss': 0.3958, 'learning_rate': 1.4198753622228599e-05, 'epoch': 0.38} + + 38%|███▊ | 2813/7378 [9:39:27<15:29:08, 12.21s/it] + 38%|███▊ | 2814/7378 [9:39:39<15:21:01, 12.11s/it] + +{'loss': 0.4794, 'learning_rate': 1.4194768795934529e-05, 'epoch': 0.38} + + 38%|███▊ | 2814/7378 [9:39:39<15:21:01, 12.11s/it] + 38%|███▊ | 2815/7378 [9:39:51<15:23:44, 12.15s/it] + +{'loss': 0.4281, 'learning_rate': 1.4190783161164512e-05, 'epoch': 0.38} + + 38%|███▊ | 2815/7378 [9:39:51<15:23:44, 12.15s/it] + 38%|███▊ | 2816/7378 [9:40:03<15:26:12, 12.18s/it] + +{'loss': 0.4088, 'learning_rate': 1.4186796718686719e-05, 'epoch': 0.38} + + 38%|███▊ | 2816/7378 [9:40:03<15:26:12, 12.18s/it] + 38%|███▊ | 2817/7378 [9:40:15<15:23:13, 12.15s/it] + +{'loss': 0.4778, 'learning_rate': 1.4182809469269474e-05, 'epoch': 0.38} + + 38%|███▊ | 2817/7378 [9:40:15<15:23:13, 12.15s/it] + 38%|███▊ | 2818/7378 [9:40:27<15:20:11, 12.11s/it] + +{'loss': 0.4896, 'learning_rate': 1.4178821413681254e-05, 'epoch': 0.38} + + 38%|███▊ | 2818/7378 [9:40:27<15:20:11, 12.11s/it] + 38%|███▊ | 2819/7378 [9:40:40<15:25:47, 12.18s/it] + +{'loss': 0.5053, 'learning_rate': 1.4174832552690697e-05, 'epoch': 0.38} + + 38%|███▊ | 2819/7378 [9:40:40<15:25:47, 12.18s/it] + 38%|███▊ | 2820/7378 [9:40:52<15:28:29, 12.22s/it] + +{'loss': 0.4783, 'learning_rate': 1.4170842887066592e-05, 'epoch': 0.38} + + 38%|███▊ | 2820/7378 [9:40:52<15:28:29, 12.22s/it] + 38%|███▊ | 2821/7378 [9:41:04<15:25:52, 12.19s/it] + +{'loss': 0.5, 'learning_rate': 1.4166852417577882e-05, 'epoch': 0.38} + + 38%|███▊ | 2821/7378 [9:41:04<15:25:52, 12.19s/it] + 38%|███▊ | 2822/7378 [9:41:17<15:36:54, 12.34s/it] + +{'loss': 0.4748, 'learning_rate': 1.4162861144993671e-05, 'epoch': 0.38} + + 38%|███▊ | 2822/7378 [9:41:17<15:36:54, 12.34s/it] + 38%|███▊ | 2823/7378 [9:41:29<15:25:37, 12.19s/it] + +{'loss': 0.4918, 'learning_rate': 1.4158869070083214e-05, 'epoch': 0.38} + + 38%|███▊ | 2823/7378 [9:41:29<15:25:37, 12.19s/it] + 38%|███▊ | 2824/7378 [9:41:41<15:31:26, 12.27s/it] + +{'loss': 0.4694, 'learning_rate': 1.4154876193615921e-05, 'epoch': 0.38} + + 38%|███▊ | 2824/7378 [9:41:41<15:31:26, 12.27s/it] + 38%|███▊ | 2825/7378 [9:41:53<15:32:19, 12.29s/it] + +{'loss': 0.4482, 'learning_rate': 1.4150882516361357e-05, 'epoch': 0.38} + + 38%|███▊ | 2825/7378 [9:41:53<15:32:19, 12.29s/it] + 38%|███▊ | 2826/7378 [9:42:06<15:42:46, 12.43s/it] + +{'loss': 0.4715, 'learning_rate': 1.4146888039089238e-05, 'epoch': 0.38} + + 38%|███▊ | 2826/7378 [9:42:06<15:42:46, 12.43s/it] + 38%|███▊ | 2827/7378 [9:42:19<15:40:28, 12.40s/it] + +{'loss': 0.5946, 'learning_rate': 1.4142892762569439e-05, 'epoch': 0.38} + + 38%|███▊ | 2827/7378 [9:42:19<15:40:28, 12.40s/it] + 38%|███▊ | 2828/7378 [9:42:31<15:32:04, 12.29s/it] + +{'loss': 0.563, 'learning_rate': 1.4138896687571983e-05, 'epoch': 0.38} + + 38%|███▊ | 2828/7378 [9:42:31<15:32:04, 12.29s/it] + 38%|███▊ | 2829/7378 [9:42:42<15:23:53, 12.19s/it] + +{'loss': 0.4682, 'learning_rate': 1.4134899814867055e-05, 'epoch': 0.38} + + 38%|███▊ | 2829/7378 [9:42:42<15:23:53, 12.19s/it] + 38%|███▊ | 2830/7378 [9:42:54<15:17:24, 12.10s/it] + +{'loss': 0.4839, 'learning_rate': 1.4130902145224989e-05, 'epoch': 0.38} + + 38%|███▊ | 2830/7378 [9:42:54<15:17:24, 12.10s/it] + 38%|███▊ | 2831/7378 [9:43:07<15:21:37, 12.16s/it] + +{'loss': 0.5383, 'learning_rate': 1.4126903679416272e-05, 'epoch': 0.38} + + 38%|███▊ | 2831/7378 [9:43:07<15:21:37, 12.16s/it] + 38%|███▊ | 2832/7378 [9:43:19<15:20:15, 12.15s/it] + +{'loss': 0.4273, 'learning_rate': 1.412290441821155e-05, 'epoch': 0.38} + + 38%|███▊ | 2832/7378 [9:43:19<15:20:15, 12.15s/it] + 38%|███▊ | 2833/7378 [9:43:31<15:19:12, 12.13s/it] + +{'loss': 0.4874, 'learning_rate': 1.4118904362381609e-05, 'epoch': 0.38} + + 38%|███▊ | 2833/7378 [9:43:31<15:19:12, 12.13s/it] + 38%|███▊ | 2834/7378 [9:43:43<15:22:48, 12.18s/it] + +{'loss': 0.5107, 'learning_rate': 1.4114903512697407e-05, 'epoch': 0.38} + + 38%|███▊ | 2834/7378 [9:43:43<15:22:48, 12.18s/it] + 38%|███▊ | 2835/7378 [9:43:56<15:25:46, 12.23s/it] + +{'loss': 0.4997, 'learning_rate': 1.4110901869930039e-05, 'epoch': 0.38} + + 38%|███▊ | 2835/7378 [9:43:56<15:25:46, 12.23s/it] + 38%|███▊ | 2836/7378 [9:44:08<15:39:55, 12.42s/it] + +{'loss': 0.5143, 'learning_rate': 1.410689943485076e-05, 'epoch': 0.38} + + 38%|███▊ | 2836/7378 [9:44:08<15:39:55, 12.42s/it] + 38%|███▊ | 2837/7378 [9:44:21<15:34:23, 12.35s/it] + +{'loss': 0.459, 'learning_rate': 1.4102896208230979e-05, 'epoch': 0.38} + + 38%|███▊ | 2837/7378 [9:44:21<15:34:23, 12.35s/it] + 38%|███▊ | 2838/7378 [9:44:33<15:27:52, 12.26s/it] + +{'loss': 0.5112, 'learning_rate': 1.409889219084225e-05, 'epoch': 0.38} + + 38%|███▊ | 2838/7378 [9:44:33<15:27:52, 12.26s/it] + 38%|███▊ | 2839/7378 [9:44:45<15:26:43, 12.25s/it] + +{'loss': 0.4502, 'learning_rate': 1.409488738345629e-05, 'epoch': 0.38} + + 38%|███▊ | 2839/7378 [9:44:45<15:26:43, 12.25s/it] + 38%|███▊ | 2840/7378 [9:44:57<15:26:33, 12.25s/it] + +{'loss': 0.4988, 'learning_rate': 1.4090881786844958e-05, 'epoch': 0.38} + + 38%|███▊ | 2840/7378 [9:44:57<15:26:33, 12.25s/it] + 39%|███▊ | 2841/7378 [9:45:09<15:27:47, 12.27s/it] + +{'loss': 0.4746, 'learning_rate': 1.4086875401780276e-05, 'epoch': 0.39} + + 39%|███▊ | 2841/7378 [9:45:09<15:27:47, 12.27s/it] + 39%|███▊ | 2842/7378 [9:45:22<15:41:04, 12.45s/it] + +{'loss': 0.5077, 'learning_rate': 1.4082868229034405e-05, 'epoch': 0.39} + + 39%|███▊ | 2842/7378 [9:45:22<15:41:04, 12.45s/it] + 39%|███▊ | 2843/7378 [9:45:34<15:33:41, 12.35s/it] + +{'loss': 0.4557, 'learning_rate': 1.4078860269379673e-05, 'epoch': 0.39} + + 39%|███▊ | 2843/7378 [9:45:34<15:33:41, 12.35s/it] + 39%|███▊ | 2844/7378 [9:45:47<15:35:24, 12.38s/it] + +{'loss': 0.5023, 'learning_rate': 1.4074851523588542e-05, 'epoch': 0.39} + + 39%|███▊ | 2844/7378 [9:45:47<15:35:24, 12.38s/it] + 39%|███▊ | 2845/7378 [9:45:59<15:35:22, 12.38s/it] + +{'loss': 0.4949, 'learning_rate': 1.4070841992433643e-05, 'epoch': 0.39} + + 39%|███▊ | 2845/7378 [9:45:59<15:35:22, 12.38s/it] + 39%|███▊ | 2846/7378 [9:46:12<15:34:43, 12.37s/it] + +{'loss': 0.4316, 'learning_rate': 1.4066831676687747e-05, 'epoch': 0.39} + + 39%|███▊ | 2846/7378 [9:46:12<15:34:43, 12.37s/it] + 39%|███▊ | 2847/7378 [9:46:24<15:29:12, 12.30s/it] + +{'loss': 0.4534, 'learning_rate': 1.4062820577123778e-05, 'epoch': 0.39} + + 39%|███▊ | 2847/7378 [9:46:24<15:29:12, 12.30s/it] + 39%|███▊ | 2848/7378 [9:46:37<15:41:33, 12.47s/it] + +{'loss': 0.517, 'learning_rate': 1.4058808694514814e-05, 'epoch': 0.39} + + 39%|███▊ | 2848/7378 [9:46:37<15:41:33, 12.47s/it] + 39%|███▊ | 2849/7378 [9:46:49<15:31:18, 12.34s/it] + +{'loss': 0.4844, 'learning_rate': 1.4054796029634082e-05, 'epoch': 0.39} + + 39%|███▊ | 2849/7378 [9:46:49<15:31:18, 12.34s/it] + 39%|███▊ | 2850/7378 [9:47:01<15:29:59, 12.32s/it] + +{'loss': 0.4503, 'learning_rate': 1.4050782583254963e-05, 'epoch': 0.39} + + 39%|███▊ | 2850/7378 [9:47:01<15:29:59, 12.32s/it] + 39%|███▊ | 2851/7378 [9:47:13<15:20:57, 12.21s/it] + +{'loss': 0.539, 'learning_rate': 1.404676835615098e-05, 'epoch': 0.39} + + 39%|███▊ | 2851/7378 [9:47:13<15:20:57, 12.21s/it] + 39%|███▊ | 2852/7378 [9:47:25<15:29:28, 12.32s/it] + +{'loss': 0.5172, 'learning_rate': 1.4042753349095817e-05, 'epoch': 0.39} + + 39%|███▊ | 2852/7378 [9:47:25<15:29:28, 12.32s/it] + 39%|███▊ | 2853/7378 [9:47:38<15:36:56, 12.42s/it] + +{'loss': 0.4737, 'learning_rate': 1.4038737562863305e-05, 'epoch': 0.39} + + 39%|███▊ | 2853/7378 [9:47:38<15:36:56, 12.42s/it] + 39%|███▊ | 2854/7378 [9:47:50<15:33:34, 12.38s/it] + +{'loss': 0.4815, 'learning_rate': 1.403472099822742e-05, 'epoch': 0.39} + + 39%|███▊ | 2854/7378 [9:47:50<15:33:34, 12.38s/it] + 39%|███▊ | 2855/7378 [9:48:03<15:27:16, 12.30s/it] + +{'loss': 0.4655, 'learning_rate': 1.4030703655962295e-05, 'epoch': 0.39} + + 39%|███▊ | 2855/7378 [9:48:03<15:27:16, 12.30s/it] + 39%|███▊ | 2856/7378 [9:48:15<15:28:54, 12.33s/it] + +{'loss': 0.5177, 'learning_rate': 1.4026685536842206e-05, 'epoch': 0.39} + + 39%|███▊ | 2856/7378 [9:48:15<15:28:54, 12.33s/it] + 39%|███▊ | 2857/7378 [9:48:28<15:36:22, 12.43s/it] + +{'loss': 0.4598, 'learning_rate': 1.4022666641641589e-05, 'epoch': 0.39} + + 39%|███▊ | 2857/7378 [9:48:28<15:36:22, 12.43s/it] + 39%|███▊ | 2858/7378 [9:48:40<15:30:04, 12.35s/it] + +{'loss': 0.4388, 'learning_rate': 1.4018646971135015e-05, 'epoch': 0.39} + + 39%|███▊ | 2858/7378 [9:48:40<15:30:04, 12.35s/it] + 39%|███▉ | 2859/7378 [9:48:52<15:37:13, 12.44s/it] + +{'loss': 0.5346, 'learning_rate': 1.4014626526097218e-05, 'epoch': 0.39} + + 39%|███▉ | 2859/7378 [9:48:52<15:37:13, 12.44s/it] + 39%|███▉ | 2860/7378 [9:49:05<15:46:00, 12.56s/it] + +{'loss': 0.4737, 'learning_rate': 1.4010605307303074e-05, 'epoch': 0.39} + + 39%|███▉ | 2860/7378 [9:49:05<15:46:00, 12.56s/it] + 39%|███▉ | 2861/7378 [9:49:18<15:40:14, 12.49s/it] + +{'loss': 0.5044, 'learning_rate': 1.4006583315527609e-05, 'epoch': 0.39} + + 39%|███▉ | 2861/7378 [9:49:18<15:40:14, 12.49s/it] + 39%|███▉ | 2862/7378 [9:49:30<15:32:39, 12.39s/it] + +{'loss': 0.4326, 'learning_rate': 1.4002560551546001e-05, 'epoch': 0.39} + + 39%|███▉ | 2862/7378 [9:49:30<15:32:39, 12.39s/it] + 39%|███▉ | 2863/7378 [9:49:42<15:23:00, 12.27s/it] + +{'loss': 0.4212, 'learning_rate': 1.3998537016133571e-05, 'epoch': 0.39} + + 39%|███▉ | 2863/7378 [9:49:42<15:23:00, 12.27s/it] + 39%|███▉ | 2864/7378 [9:49:54<15:25:06, 12.30s/it] + +{'loss': 0.4701, 'learning_rate': 1.39945127100658e-05, 'epoch': 0.39} + + 39%|███▉ | 2864/7378 [9:49:54<15:25:06, 12.30s/it] + 39%|███▉ | 2865/7378 [9:50:06<15:21:49, 12.26s/it] + +{'loss': 0.494, 'learning_rate': 1.3990487634118299e-05, 'epoch': 0.39} + + 39%|███▉ | 2865/7378 [9:50:06<15:21:49, 12.26s/it] + 39%|███▉ | 2866/7378 [9:50:19<15:24:15, 12.29s/it] + +{'loss': 0.4363, 'learning_rate': 1.398646178906685e-05, 'epoch': 0.39} + + 39%|███▉ | 2866/7378 [9:50:19<15:24:15, 12.29s/it] + 39%|███▉ | 2867/7378 [9:50:31<15:37:51, 12.47s/it] + +{'loss': 0.4915, 'learning_rate': 1.3982435175687363e-05, 'epoch': 0.39} + + 39%|███▉ | 2867/7378 [9:50:31<15:37:51, 12.47s/it] + 39%|███▉ | 2868/7378 [9:50:44<15:27:44, 12.34s/it] + +{'loss': 0.4707, 'learning_rate': 1.3978407794755906e-05, 'epoch': 0.39} + + 39%|███▉ | 2868/7378 [9:50:44<15:27:44, 12.34s/it] + 39%|███▉ | 2869/7378 [9:50:56<15:28:35, 12.36s/it] + +{'loss': 0.6144, 'learning_rate': 1.3974379647048698e-05, 'epoch': 0.39} + + 39%|███▉ | 2869/7378 [9:50:56<15:28:35, 12.36s/it] + 39%|███▉ | 2870/7378 [9:51:08<15:29:47, 12.38s/it] + +{'loss': 0.4452, 'learning_rate': 1.3970350733342094e-05, 'epoch': 0.39} + + 39%|███▉ | 2870/7378 [9:51:08<15:29:47, 12.38s/it] + 39%|███▉ | 2871/7378 [9:51:21<15:31:18, 12.40s/it] + +{'loss': 0.5181, 'learning_rate': 1.3966321054412613e-05, 'epoch': 0.39} + + 39%|███▉ | 2871/7378 [9:51:21<15:31:18, 12.40s/it] + 39%|███▉ | 2872/7378 [9:51:33<15:17:26, 12.22s/it] + +{'loss': 0.4311, 'learning_rate': 1.3962290611036904e-05, 'epoch': 0.39} + + 39%|███▉ | 2872/7378 [9:51:33<15:17:26, 12.22s/it] + 39%|███▉ | 2873/7378 [9:51:45<15:22:41, 12.29s/it] + +{'loss': 0.4935, 'learning_rate': 1.3958259403991776e-05, 'epoch': 0.39} + + 39%|███▉ | 2873/7378 [9:51:45<15:22:41, 12.29s/it] + 39%|███▉ | 2874/7378 [9:51:57<15:23:15, 12.30s/it] + +{'loss': 0.4268, 'learning_rate': 1.3954227434054182e-05, 'epoch': 0.39} + + 39%|███▉ | 2874/7378 [9:51:57<15:23:15, 12.30s/it] + 39%|███▉ | 2875/7378 [9:52:10<15:29:03, 12.38s/it] + +{'loss': 0.4535, 'learning_rate': 1.395019470200122e-05, 'epoch': 0.39} + + 39%|███▉ | 2875/7378 [9:52:10<15:29:03, 12.38s/it] + 39%|███▉ | 2876/7378 [9:52:22<15:22:13, 12.29s/it] + +{'loss': 0.5216, 'learning_rate': 1.3946161208610134e-05, 'epoch': 0.39} + + 39%|███▉ | 2876/7378 [9:52:22<15:22:13, 12.29s/it] + 39%|███▉ | 2877/7378 [9:52:34<15:16:34, 12.22s/it] + +{'loss': 0.4534, 'learning_rate': 1.3942126954658317e-05, 'epoch': 0.39} + + 39%|███▉ | 2877/7378 [9:52:34<15:16:34, 12.22s/it] + 39%|███▉ | 2878/7378 [9:52:46<15:20:52, 12.28s/it] + +{'loss': 0.4094, 'learning_rate': 1.3938091940923312e-05, 'epoch': 0.39} + + 39%|███▉ | 2878/7378 [9:52:46<15:20:52, 12.28s/it] + 39%|███▉ | 2879/7378 [9:52:58<15:08:21, 12.11s/it] + +{'loss': 0.4619, 'learning_rate': 1.3934056168182802e-05, 'epoch': 0.39} + + 39%|███▉ | 2879/7378 [9:52:58<15:08:21, 12.11s/it] + 39%|███▉ | 2880/7378 [9:53:10<15:10:52, 12.15s/it] + +{'loss': 0.4715, 'learning_rate': 1.3930019637214618e-05, 'epoch': 0.39} + + 39%|███▉ | 2880/7378 [9:53:10<15:10:52, 12.15s/it] + 39%|███▉ | 2881/7378 [9:53:23<15:09:29, 12.13s/it] + +{'loss': 0.4929, 'learning_rate': 1.3925982348796739e-05, 'epoch': 0.39} + + 39%|███▉ | 2881/7378 [9:53:23<15:09:29, 12.13s/it] + 39%|███▉ | 2882/7378 [9:53:35<15:17:10, 12.24s/it] + +{'loss': 0.4683, 'learning_rate': 1.3921944303707287e-05, 'epoch': 0.39} + + 39%|███▉ | 2882/7378 [9:53:35<15:17:10, 12.24s/it] + 39%|███▉ | 2883/7378 [9:53:47<15:18:45, 12.26s/it] + +{'loss': 0.5011, 'learning_rate': 1.3917905502724538e-05, 'epoch': 0.39} + + 39%|███▉ | 2883/7378 [9:53:47<15:18:45, 12.26s/it] + 39%|███▉ | 2884/7378 [9:54:00<15:17:42, 12.25s/it] + +{'loss': 0.4405, 'learning_rate': 1.3913865946626898e-05, 'epoch': 0.39} + + 39%|███▉ | 2884/7378 [9:54:00<15:17:42, 12.25s/it] + 39%|███▉ | 2885/7378 [9:54:12<15:14:18, 12.21s/it] + +{'loss': 0.4489, 'learning_rate': 1.390982563619294e-05, 'epoch': 0.39} + + 39%|███▉ | 2885/7378 [9:54:12<15:14:18, 12.21s/it] + 39%|███▉ | 2886/7378 [9:54:24<15:12:24, 12.19s/it] + +{'loss': 0.4663, 'learning_rate': 1.390578457220136e-05, 'epoch': 0.39} + + 39%|███▉ | 2886/7378 [9:54:24<15:12:24, 12.19s/it] + 39%|███▉ | 2887/7378 [9:54:36<15:18:29, 12.27s/it] + +{'loss': 0.4051, 'learning_rate': 1.3901742755431016e-05, 'epoch': 0.39} + + 39%|███▉ | 2887/7378 [9:54:36<15:18:29, 12.27s/it] + 39%|███▉ | 2888/7378 [9:54:48<15:17:01, 12.25s/it] + +{'loss': 0.45, 'learning_rate': 1.38977001866609e-05, 'epoch': 0.39} + + 39%|███▉ | 2888/7378 [9:54:49<15:17:01, 12.25s/it] + 39%|███▉ | 2889/7378 [9:55:01<15:20:56, 12.31s/it] + +{'loss': 0.4552, 'learning_rate': 1.389365686667016e-05, 'epoch': 0.39} + + 39%|███▉ | 2889/7378 [9:55:01<15:20:56, 12.31s/it] + 39%|███▉ | 2890/7378 [9:55:13<15:22:23, 12.33s/it] + +{'loss': 0.4405, 'learning_rate': 1.3889612796238078e-05, 'epoch': 0.39} + + 39%|███▉ | 2890/7378 [9:55:13<15:22:23, 12.33s/it] + 39%|███▉ | 2891/7378 [9:55:25<15:16:22, 12.25s/it] + +{'loss': 0.4368, 'learning_rate': 1.3885567976144088e-05, 'epoch': 0.39} + + 39%|███▉ | 2891/7378 [9:55:25<15:16:22, 12.25s/it] + 39%|███▉ | 2892/7378 [9:55:38<15:14:09, 12.23s/it] + +{'loss': 0.4597, 'learning_rate': 1.3881522407167763e-05, 'epoch': 0.39} + + 39%|███▉ | 2892/7378 [9:55:38<15:14:09, 12.23s/it] + 39%|███▉ | 2893/7378 [9:55:50<15:10:18, 12.18s/it] + +{'loss': 0.455, 'learning_rate': 1.3877476090088822e-05, 'epoch': 0.39} + + 39%|███▉ | 2893/7378 [9:55:50<15:10:18, 12.18s/it] + 39%|███▉ | 2894/7378 [9:56:02<15:16:51, 12.27s/it] + +{'loss': 0.4289, 'learning_rate': 1.3873429025687132e-05, 'epoch': 0.39} + + 39%|███▉ | 2894/7378 [9:56:02<15:16:51, 12.27s/it] + 39%|███▉ | 2895/7378 [9:56:14<15:05:39, 12.12s/it] + +{'loss': 0.4831, 'learning_rate': 1.3869381214742701e-05, 'epoch': 0.39} + + 39%|███▉ | 2895/7378 [9:56:14<15:05:39, 12.12s/it] + 39%|███▉ | 2896/7378 [9:56:26<14:58:48, 12.03s/it] + +{'loss': 0.4945, 'learning_rate': 1.386533265803568e-05, 'epoch': 0.39} + + 39%|███▉ | 2896/7378 [9:56:26<14:58:48, 12.03s/it] + 39%|███▉ | 2897/7378 [9:56:38<14:54:12, 11.97s/it] + +{'loss': 0.4476, 'learning_rate': 1.3861283356346367e-05, 'epoch': 0.39} + + 39%|███▉ | 2897/7378 [9:56:38<14:54:12, 11.97s/it] + 39%|███▉ | 2898/7378 [9:56:50<14:56:59, 12.01s/it] + +{'loss': 0.4921, 'learning_rate': 1.3857233310455199e-05, 'epoch': 0.39} + + 39%|███▉ | 2898/7378 [9:56:50<14:56:59, 12.01s/it] + 39%|███▉ | 2899/7378 [9:57:02<15:05:06, 12.12s/it] + +{'loss': 0.4707, 'learning_rate': 1.385318252114276e-05, 'epoch': 0.39} + + 39%|███▉ | 2899/7378 [9:57:02<15:05:06, 12.12s/it] + 39%|███▉ | 2900/7378 [9:57:14<15:01:28, 12.08s/it] + +{'loss': 0.4527, 'learning_rate': 1.3849130989189773e-05, 'epoch': 0.39} + + 39%|███▉ | 2900/7378 [9:57:14<15:01:28, 12.08s/it] + 39%|███▉ | 2901/7378 [9:57:27<15:12:30, 12.23s/it] + +{'loss': 0.5186, 'learning_rate': 1.3845078715377116e-05, 'epoch': 0.39} + + 39%|███▉ | 2901/7378 [9:57:27<15:12:30, 12.23s/it] + 39%|███▉ | 2902/7378 [9:57:39<15:14:19, 12.26s/it] + +{'loss': 0.4772, 'learning_rate': 1.3841025700485793e-05, 'epoch': 0.39} + + 39%|███▉ | 2902/7378 [9:57:39<15:14:19, 12.26s/it] + 39%|███▉ | 2903/7378 [9:57:52<15:23:45, 12.39s/it] + +{'loss': 0.4561, 'learning_rate': 1.383697194529696e-05, 'epoch': 0.39} + + 39%|███▉ | 2903/7378 [9:57:52<15:23:45, 12.39s/it] + 39%|███▉ | 2904/7378 [9:58:04<15:14:25, 12.26s/it] + +{'loss': 0.4863, 'learning_rate': 1.3832917450591918e-05, 'epoch': 0.39} + + 39%|███▉ | 2904/7378 [9:58:04<15:14:25, 12.26s/it] + 39%|███▉ | 2905/7378 [9:58:16<15:13:44, 12.26s/it] + +{'loss': 0.4884, 'learning_rate': 1.3828862217152104e-05, 'epoch': 0.39} + + 39%|███▉ | 2905/7378 [9:58:16<15:13:44, 12.26s/it] + 39%|███▉ | 2906/7378 [9:58:28<15:15:22, 12.28s/it] + +{'loss': 0.4863, 'learning_rate': 1.3824806245759107e-05, 'epoch': 0.39} + + 39%|███▉ | 2906/7378 [9:58:28<15:15:22, 12.28s/it] + 39%|███▉ | 2907/7378 [9:58:41<15:19:05, 12.33s/it] + +{'loss': 0.4835, 'learning_rate': 1.3820749537194641e-05, 'epoch': 0.39} + + 39%|███▉ | 2907/7378 [9:58:41<15:19:05, 12.33s/it] + 39%|███▉ | 2908/7378 [9:58:53<15:18:16, 12.33s/it] + +{'loss': 0.5294, 'learning_rate': 1.3816692092240584e-05, 'epoch': 0.39} + + 39%|███▉ | 2908/7378 [9:58:53<15:18:16, 12.33s/it] + 39%|███▉ | 2909/7378 [9:59:05<15:15:00, 12.28s/it] + +{'loss': 0.5137, 'learning_rate': 1.3812633911678938e-05, 'epoch': 0.39} + + 39%|███▉ | 2909/7378 [9:59:05<15:15:00, 12.28s/it] + 39%|███▉ | 2910/7378 [9:59:18<15:17:46, 12.32s/it] + +{'loss': 0.5101, 'learning_rate': 1.3808574996291858e-05, 'epoch': 0.39} + + 39%|███▉ | 2910/7378 [9:59:18<15:17:46, 12.32s/it] + 39%|███▉ | 2911/7378 [9:59:30<15:22:08, 12.39s/it] + +{'loss': 0.4749, 'learning_rate': 1.3804515346861633e-05, 'epoch': 0.39} + + 39%|███▉ | 2911/7378 [9:59:30<15:22:08, 12.39s/it] + 39%|███▉ | 2912/7378 [9:59:43<15:25:47, 12.44s/it] + +{'loss': 0.494, 'learning_rate': 1.3800454964170697e-05, 'epoch': 0.39} + + 39%|███▉ | 2912/7378 [9:59:43<15:25:47, 12.44s/it] + 39%|███▉ | 2913/7378 [9:59:56<15:36:06, 12.58s/it] + +{'loss': 0.4365, 'learning_rate': 1.3796393849001628e-05, 'epoch': 0.39} + + 39%|███▉ | 2913/7378 [9:59:56<15:36:06, 12.58s/it] + 39%|███▉ | 2914/7378 [10:00:08<15:23:21, 12.41s/it] + +{'loss': 0.4322, 'learning_rate': 1.3792332002137138e-05, 'epoch': 0.39} + + 39%|███▉ | 2914/7378 [10:00:08<15:23:21, 12.41s/it] + 40%|███▉ | 2915/7378 [10:00:20<15:16:41, 12.32s/it] + +{'loss': 0.3947, 'learning_rate': 1.378826942436009e-05, 'epoch': 0.4} + + 40%|███▉ | 2915/7378 [10:00:20<15:16:41, 12.32s/it] + 40%|███▉ | 2916/7378 [10:00:32<15:08:49, 12.22s/it] + +{'loss': 0.4114, 'learning_rate': 1.3784206116453475e-05, 'epoch': 0.4} + + 40%|███▉ | 2916/7378 [10:00:32<15:08:49, 12.22s/it] + 40%|███▉ | 2917/7378 [10:00:44<15:06:41, 12.19s/it] + +{'loss': 0.5348, 'learning_rate': 1.3780142079200439e-05, 'epoch': 0.4} + + 40%|███▉ | 2917/7378 [10:00:44<15:06:41, 12.19s/it] + 40%|███▉ | 2918/7378 [10:00:56<15:10:00, 12.24s/it] + +{'loss': 0.4457, 'learning_rate': 1.377607731338426e-05, 'epoch': 0.4} + + 40%|███▉ | 2918/7378 [10:00:56<15:10:00, 12.24s/it] + 40%|███▉ | 2919/7378 [10:01:08<15:09:48, 12.24s/it] + +{'loss': 0.4891, 'learning_rate': 1.3772011819788354e-05, 'epoch': 0.4} + + 40%|███▉ | 2919/7378 [10:01:08<15:09:48, 12.24s/it] + 40%|███▉ | 2920/7378 [10:01:21<15:19:23, 12.37s/it] + +{'loss': 0.448, 'learning_rate': 1.3767945599196284e-05, 'epoch': 0.4} + + 40%|███▉ | 2920/7378 [10:01:21<15:19:23, 12.37s/it] + 40%|███▉ | 2921/7378 [10:01:34<15:23:13, 12.43s/it] + +{'loss': 0.5013, 'learning_rate': 1.3763878652391749e-05, 'epoch': 0.4} + + 40%|███▉ | 2921/7378 [10:01:34<15:23:13, 12.43s/it] + 40%|███▉ | 2922/7378 [10:01:46<15:26:21, 12.47s/it] + +{'loss': 0.477, 'learning_rate': 1.3759810980158592e-05, 'epoch': 0.4} + + 40%|███▉ | 2922/7378 [10:01:46<15:26:21, 12.47s/it] + 40%|███▉ | 2923/7378 [10:01:59<15:41:58, 12.69s/it] + +{'loss': 0.4712, 'learning_rate': 1.3755742583280792e-05, 'epoch': 0.4} + + 40%|███▉ | 2923/7378 [10:01:59<15:41:58, 12.69s/it] + 40%|███▉ | 2924/7378 [10:02:11<15:29:35, 12.52s/it] + +{'loss': 0.5082, 'learning_rate': 1.3751673462542465e-05, 'epoch': 0.4} + + 40%|███▉ | 2924/7378 [10:02:12<15:29:35, 12.52s/it] + 40%|███▉ | 2925/7378 [10:02:24<15:26:38, 12.49s/it] + +{'loss': 0.431, 'learning_rate': 1.3747603618727875e-05, 'epoch': 0.4} + + 40%|███▉ | 2925/7378 [10:02:24<15:26:38, 12.49s/it] + 40%|███▉ | 2926/7378 [10:02:36<15:17:36, 12.37s/it] + +{'loss': 0.4877, 'learning_rate': 1.3743533052621415e-05, 'epoch': 0.4} + + 40%|███▉ | 2926/7378 [10:02:36<15:17:36, 12.37s/it] + 40%|███▉ | 2927/7378 [10:02:48<15:05:38, 12.21s/it] + +{'loss': 0.4752, 'learning_rate': 1.3739461765007631e-05, 'epoch': 0.4} + + 40%|███▉ | 2927/7378 [10:02:48<15:05:38, 12.21s/it] + 40%|███▉ | 2928/7378 [10:03:01<15:19:16, 12.39s/it] + +{'loss': 0.4327, 'learning_rate': 1.3735389756671193e-05, 'epoch': 0.4} + + 40%|███▉ | 2928/7378 [10:03:01<15:19:16, 12.39s/it] + 40%|███▉ | 2929/7378 [10:03:13<15:27:10, 12.50s/it] + +{'loss': 0.4464, 'learning_rate': 1.3731317028396922e-05, 'epoch': 0.4} + + 40%|███▉ | 2929/7378 [10:03:13<15:27:10, 12.50s/it] + 40%|███▉ | 2930/7378 [10:03:26<15:22:39, 12.45s/it] + +{'loss': 0.5096, 'learning_rate': 1.3727243580969767e-05, 'epoch': 0.4} + + 40%|███▉ | 2930/7378 [10:03:26<15:22:39, 12.45s/it] + 40%|███▉ | 2931/7378 [10:03:38<15:18:48, 12.40s/it] + +{'loss': 0.4567, 'learning_rate': 1.3723169415174827e-05, 'epoch': 0.4} + + 40%|███▉ | 2931/7378 [10:03:38<15:18:48, 12.40s/it] + 40%|███▉ | 2932/7378 [10:03:50<15:15:58, 12.36s/it] + +{'loss': 0.4841, 'learning_rate': 1.371909453179733e-05, 'epoch': 0.4} + + 40%|███▉ | 2932/7378 [10:03:50<15:15:58, 12.36s/it] + 40%|███▉ | 2933/7378 [10:04:03<15:14:43, 12.35s/it] + +{'loss': 0.4801, 'learning_rate': 1.3715018931622644e-05, 'epoch': 0.4} + + 40%|███▉ | 2933/7378 [10:04:03<15:14:43, 12.35s/it] + 40%|███▉ | 2934/7378 [10:04:15<15:13:22, 12.33s/it] + +{'loss': 0.4516, 'learning_rate': 1.3710942615436282e-05, 'epoch': 0.4} + + 40%|███▉ | 2934/7378 [10:04:15<15:13:22, 12.33s/it] + 40%|███▉ | 2935/7378 [10:04:27<15:07:19, 12.25s/it] + +{'loss': 0.4789, 'learning_rate': 1.3706865584023884e-05, 'epoch': 0.4} + + 40%|███▉ | 2935/7378 [10:04:27<15:07:19, 12.25s/it] + 40%|███▉ | 2936/7378 [10:04:39<15:03:07, 12.20s/it] + +{'loss': 0.4396, 'learning_rate': 1.3702787838171243e-05, 'epoch': 0.4} + + 40%|███▉ | 2936/7378 [10:04:39<15:03:07, 12.20s/it] + 40%|███▉ | 2937/7378 [10:04:51<15:07:20, 12.26s/it] + +{'loss': 0.5408, 'learning_rate': 1.369870937866427e-05, 'epoch': 0.4} + + 40%|███▉ | 2937/7378 [10:04:51<15:07:20, 12.26s/it] + 40%|███▉ | 2938/7378 [10:05:03<15:02:25, 12.19s/it] + +{'loss': 0.4308, 'learning_rate': 1.3694630206289033e-05, 'epoch': 0.4} + + 40%|███▉ | 2938/7378 [10:05:03<15:02:25, 12.19s/it] + 40%|███▉ | 2939/7378 [10:05:16<15:08:15, 12.28s/it] + +{'loss': 0.5178, 'learning_rate': 1.3690550321831724e-05, 'epoch': 0.4} + + 40%|███▉ | 2939/7378 [10:05:16<15:08:15, 12.28s/it] + 40%|███▉ | 2940/7378 [10:05:28<15:07:03, 12.26s/it] + +{'loss': 0.4957, 'learning_rate': 1.3686469726078676e-05, 'epoch': 0.4} + + 40%|███▉ | 2940/7378 [10:05:28<15:07:03, 12.26s/it] + 40%|███▉ | 2941/7378 [10:05:40<15:03:04, 12.21s/it] + +{'loss': 0.4947, 'learning_rate': 1.3682388419816365e-05, 'epoch': 0.4} + + 40%|███▉ | 2941/7378 [10:05:40<15:03:04, 12.21s/it] + 40%|███▉ | 2942/7378 [10:05:53<15:08:46, 12.29s/it] + +{'loss': 0.5083, 'learning_rate': 1.367830640383139e-05, 'epoch': 0.4} + + 40%|███▉ | 2942/7378 [10:05:53<15:08:46, 12.29s/it] + 40%|███▉ | 2943/7378 [10:06:05<15:05:04, 12.24s/it] + +{'loss': 0.4822, 'learning_rate': 1.3674223678910505e-05, 'epoch': 0.4} + + 40%|███▉ | 2943/7378 [10:06:05<15:05:04, 12.24s/it] + 40%|███▉ | 2944/7378 [10:06:17<15:05:08, 12.25s/it] + +{'loss': 0.4448, 'learning_rate': 1.3670140245840584e-05, 'epoch': 0.4} + + 40%|███▉ | 2944/7378 [10:06:17<15:05:08, 12.25s/it] + 40%|███▉ | 2945/7378 [10:06:30<15:13:35, 12.37s/it] + +{'loss': 0.4565, 'learning_rate': 1.3666056105408652e-05, 'epoch': 0.4} + + 40%|███▉ | 2945/7378 [10:06:30<15:13:35, 12.37s/it] + 40%|███▉ | 2946/7378 [10:06:42<15:09:30, 12.31s/it] + +{'loss': 0.4993, 'learning_rate': 1.3661971258401859e-05, 'epoch': 0.4} + + 40%|███▉ | 2946/7378 [10:06:42<15:09:30, 12.31s/it] + 40%|███▉ | 2947/7378 [10:06:54<15:08:06, 12.30s/it] + +{'loss': 0.483, 'learning_rate': 1.3657885705607492e-05, 'epoch': 0.4} + + 40%|███▉ | 2947/7378 [10:06:54<15:08:06, 12.30s/it] + 40%|███▉ | 2948/7378 [10:07:07<15:11:34, 12.35s/it] + +{'loss': 0.4635, 'learning_rate': 1.365379944781298e-05, 'epoch': 0.4} + + 40%|███▉ | 2948/7378 [10:07:07<15:11:34, 12.35s/it] + 40%|███▉ | 2949/7378 [10:07:19<15:13:00, 12.37s/it] + +{'loss': 0.4411, 'learning_rate': 1.3649712485805888e-05, 'epoch': 0.4} + + 40%|███▉ | 2949/7378 [10:07:19<15:13:00, 12.37s/it] + 40%|███▉ | 2950/7378 [10:07:32<15:23:44, 12.52s/it] + +{'loss': 0.4873, 'learning_rate': 1.3645624820373913e-05, 'epoch': 0.4} + + 40%|███▉ | 2950/7378 [10:07:32<15:23:44, 12.52s/it] + 40%|███▉ | 2951/7378 [10:07:44<15:13:40, 12.38s/it] + +{'loss': 0.4438, 'learning_rate': 1.3641536452304884e-05, 'epoch': 0.4} + + 40%|███▉ | 2951/7378 [10:07:44<15:13:40, 12.38s/it] + 40%|████ | 2952/7378 [10:07:57<15:17:47, 12.44s/it] + +{'loss': 0.4395, 'learning_rate': 1.3637447382386774e-05, 'epoch': 0.4} + + 40%|████ | 2952/7378 [10:07:57<15:17:47, 12.44s/it] + 40%|████ | 2953/7378 [10:08:09<15:06:41, 12.29s/it] + +{'loss': 0.4841, 'learning_rate': 1.3633357611407687e-05, 'epoch': 0.4} + + 40%|████ | 2953/7378 [10:08:09<15:06:41, 12.29s/it] + 40%|████ | 2954/7378 [10:08:21<14:59:12, 12.20s/it] + +{'loss': 0.5, 'learning_rate': 1.362926714015586e-05, 'epoch': 0.4} + + 40%|████ | 2954/7378 [10:08:21<14:59:12, 12.20s/it] + 40%|████ | 2955/7378 [10:08:33<15:07:53, 12.32s/it] + +{'loss': 0.5, 'learning_rate': 1.362517596941967e-05, 'epoch': 0.4} + + 40%|████ | 2955/7378 [10:08:33<15:07:53, 12.32s/it] + 40%|████ | 2956/7378 [10:08:45<15:06:22, 12.30s/it] + +{'loss': 0.4205, 'learning_rate': 1.3621084099987623e-05, 'epoch': 0.4} + + 40%|████ | 2956/7378 [10:08:45<15:06:22, 12.30s/it] + 40%|████ | 2957/7378 [10:08:58<15:05:21, 12.29s/it] + +{'loss': 0.4525, 'learning_rate': 1.3616991532648365e-05, 'epoch': 0.4} + + 40%|████ | 2957/7378 [10:08:58<15:05:21, 12.29s/it] + 40%|████ | 2958/7378 [10:09:10<15:11:24, 12.37s/it] + +{'loss': 0.5002, 'learning_rate': 1.3612898268190671e-05, 'epoch': 0.4} + + 40%|████ | 2958/7378 [10:09:10<15:11:24, 12.37s/it] + 40%|████ | 2959/7378 [10:09:23<15:11:48, 12.38s/it] + +{'loss': 0.5106, 'learning_rate': 1.3608804307403461e-05, 'epoch': 0.4} + + 40%|████ | 2959/7378 [10:09:23<15:11:48, 12.38s/it] + 40%|████ | 2960/7378 [10:09:35<15:05:59, 12.30s/it] + +{'loss': 0.4558, 'learning_rate': 1.3604709651075771e-05, 'epoch': 0.4} + + 40%|████ | 2960/7378 [10:09:35<15:05:59, 12.30s/it] + 40%|████ | 2961/7378 [10:09:47<15:04:17, 12.28s/it] + +{'loss': 0.4438, 'learning_rate': 1.3600614299996791e-05, 'epoch': 0.4} + + 40%|████ | 2961/7378 [10:09:47<15:04:17, 12.28s/it] + 40%|████ | 2962/7378 [10:09:59<15:01:46, 12.25s/it] + +{'loss': 0.4572, 'learning_rate': 1.359651825495583e-05, 'epoch': 0.4} + + 40%|████ | 2962/7378 [10:09:59<15:01:46, 12.25s/it] + 40%|████ | 2963/7378 [10:10:12<15:07:54, 12.34s/it] + +{'loss': 0.5417, 'learning_rate': 1.3592421516742342e-05, 'epoch': 0.4} + + 40%|████ | 2963/7378 [10:10:12<15:07:54, 12.34s/it] + 40%|████ | 2964/7378 [10:10:24<15:04:47, 12.30s/it] + +{'loss': 0.4471, 'learning_rate': 1.3588324086145906e-05, 'epoch': 0.4} + + 40%|████ | 2964/7378 [10:10:24<15:04:47, 12.30s/it] + 40%|████ | 2965/7378 [10:10:37<15:17:07, 12.47s/it] + +{'loss': 0.4791, 'learning_rate': 1.3584225963956235e-05, 'epoch': 0.4} + + 40%|████ | 2965/7378 [10:10:37<15:17:07, 12.47s/it] + 40%|████ | 2966/7378 [10:10:49<15:10:14, 12.38s/it] + +{'loss': 0.4782, 'learning_rate': 1.3580127150963183e-05, 'epoch': 0.4} + + 40%|████ | 2966/7378 [10:10:49<15:10:14, 12.38s/it] + 40%|████ | 2967/7378 [10:11:01<15:08:51, 12.36s/it] + +{'loss': 0.429, 'learning_rate': 1.3576027647956727e-05, 'epoch': 0.4} + + 40%|████ | 2967/7378 [10:11:01<15:08:51, 12.36s/it] + 40%|████ | 2968/7378 [10:11:14<15:11:41, 12.40s/it] + +{'loss': 0.4588, 'learning_rate': 1.3571927455726985e-05, 'epoch': 0.4} + + 40%|████ | 2968/7378 [10:11:14<15:11:41, 12.40s/it] + 40%|████ | 2969/7378 [10:11:26<15:13:10, 12.43s/it] + +{'loss': 0.4091, 'learning_rate': 1.3567826575064205e-05, 'epoch': 0.4} + + 40%|████ | 2969/7378 [10:11:26<15:13:10, 12.43s/it] + 40%|████ | 2970/7378 [10:11:39<15:11:06, 12.40s/it] + +{'loss': 0.4592, 'learning_rate': 1.3563725006758766e-05, 'epoch': 0.4} + + 40%|████ | 2970/7378 [10:11:39<15:11:06, 12.40s/it] + 40%|████ | 2971/7378 [10:11:51<15:07:12, 12.35s/it] + +{'loss': 0.409, 'learning_rate': 1.3559622751601182e-05, 'epoch': 0.4} + + 40%|████ | 2971/7378 [10:11:51<15:07:12, 12.35s/it] + 40%|████ | 2972/7378 [10:12:03<15:04:12, 12.31s/it] + +{'loss': 0.4378, 'learning_rate': 1.3555519810382095e-05, 'epoch': 0.4} + + 40%|████ | 2972/7378 [10:12:03<15:04:12, 12.31s/it] + 40%|████ | 2973/7378 [10:12:15<14:58:48, 12.24s/it] + +{'loss': 0.4223, 'learning_rate': 1.355141618389229e-05, 'epoch': 0.4} + + 40%|████ | 2973/7378 [10:12:15<14:58:48, 12.24s/it] + 40%|████ | 2974/7378 [10:12:27<14:58:16, 12.24s/it] + +{'loss': 0.4778, 'learning_rate': 1.3547311872922668e-05, 'epoch': 0.4} + + 40%|████ | 2974/7378 [10:12:27<14:58:16, 12.24s/it] + 40%|████ | 2975/7378 [10:12:39<14:52:15, 12.16s/it] + +{'loss': 0.4773, 'learning_rate': 1.354320687826428e-05, 'epoch': 0.4} + + 40%|████ | 2975/7378 [10:12:39<14:52:15, 12.16s/it] + 40%|████ | 2976/7378 [10:12:52<14:56:09, 12.21s/it] + +{'loss': 0.49, 'learning_rate': 1.353910120070829e-05, 'epoch': 0.4} + + 40%|████ | 2976/7378 [10:12:52<14:56:09, 12.21s/it] + 40%|████ | 2977/7378 [10:13:04<14:58:43, 12.25s/it] + +{'loss': 0.4584, 'learning_rate': 1.3534994841046007e-05, 'epoch': 0.4} + + 40%|████ | 2977/7378 [10:13:04<14:58:43, 12.25s/it] + 40%|████ | 2978/7378 [10:13:16<14:57:21, 12.24s/it] + +{'loss': 0.4673, 'learning_rate': 1.3530887800068872e-05, 'epoch': 0.4} + + 40%|████ | 2978/7378 [10:13:16<14:57:21, 12.24s/it] + 40%|████ | 2979/7378 [10:13:28<14:54:44, 12.20s/it] + +{'loss': 0.4656, 'learning_rate': 1.3526780078568444e-05, 'epoch': 0.4} + + 40%|████ | 2979/7378 [10:13:28<14:54:44, 12.20s/it] + 40%|████ | 2980/7378 [10:13:41<14:57:38, 12.25s/it] + +{'loss': 0.4733, 'learning_rate': 1.352267167733643e-05, 'epoch': 0.4} + + 40%|████ | 2980/7378 [10:13:41<14:57:38, 12.25s/it] + 40%|████ | 2981/7378 [10:13:53<14:56:19, 12.23s/it] + +{'loss': 0.4038, 'learning_rate': 1.3518562597164656e-05, 'epoch': 0.4} + + 40%|████ | 2981/7378 [10:13:53<14:56:19, 12.23s/it] + 40%|████ | 2982/7378 [10:14:05<14:52:59, 12.19s/it] + +{'loss': 0.4296, 'learning_rate': 1.351445283884508e-05, 'epoch': 0.4} + + 40%|████ | 2982/7378 [10:14:05<14:52:59, 12.19s/it] + 40%|████ | 2983/7378 [10:14:18<15:00:41, 12.30s/it] + +{'loss': 0.4554, 'learning_rate': 1.3510342403169799e-05, 'epoch': 0.4} + + 40%|████ | 2983/7378 [10:14:18<15:00:41, 12.30s/it] + 40%|████ | 2984/7378 [10:14:30<14:55:22, 12.23s/it] + +{'loss': 0.4727, 'learning_rate': 1.350623129093103e-05, 'epoch': 0.4} + + 40%|████ | 2984/7378 [10:14:30<14:55:22, 12.23s/it] + 40%|████ | 2985/7378 [10:14:42<14:51:49, 12.18s/it] + +{'loss': 0.4607, 'learning_rate': 1.3502119502921134e-05, 'epoch': 0.4} + + 40%|████ | 2985/7378 [10:14:42<14:51:49, 12.18s/it] + 40%|████ | 2986/7378 [10:14:54<14:54:36, 12.22s/it] + +{'loss': 0.4491, 'learning_rate': 1.3498007039932583e-05, 'epoch': 0.4} + + 40%|████ | 2986/7378 [10:14:54<14:54:36, 12.22s/it] + 40%|████ | 2987/7378 [10:15:06<14:54:39, 12.22s/it] + +{'loss': 0.4648, 'learning_rate': 1.3493893902757997e-05, 'epoch': 0.4} + + 40%|████ | 2987/7378 [10:15:06<14:54:39, 12.22s/it] + 40%|████ | 2988/7378 [10:15:18<14:50:48, 12.18s/it] + +{'loss': 0.5083, 'learning_rate': 1.3489780092190117e-05, 'epoch': 0.4} + + 40%|████ | 2988/7378 [10:15:18<14:50:48, 12.18s/it] + 41%|████ | 2989/7378 [10:15:30<14:48:08, 12.14s/it] + +{'loss': 0.4459, 'learning_rate': 1.3485665609021815e-05, 'epoch': 0.41} + + 41%|████ | 2989/7378 [10:15:30<14:48:08, 12.14s/it] + 41%|████ | 2990/7378 [10:15:43<14:48:54, 12.15s/it] + +{'loss': 0.486, 'learning_rate': 1.3481550454046094e-05, 'epoch': 0.41} + + 41%|████ | 2990/7378 [10:15:43<14:48:54, 12.15s/it] + 41%|████ | 2991/7378 [10:15:55<14:52:53, 12.21s/it] + +{'loss': 0.3717, 'learning_rate': 1.3477434628056081e-05, 'epoch': 0.41} + + 41%|████ | 2991/7378 [10:15:55<14:52:53, 12.21s/it] + 41%|████ | 2992/7378 [10:16:07<14:50:51, 12.19s/it] + +{'loss': 0.4466, 'learning_rate': 1.3473318131845043e-05, 'epoch': 0.41} + + 41%|████ | 2992/7378 [10:16:07<14:50:51, 12.19s/it] + 41%|████ | 2993/7378 [10:16:19<14:49:51, 12.18s/it] + +{'loss': 0.4791, 'learning_rate': 1.3469200966206366e-05, 'epoch': 0.41} + + 41%|████ | 2993/7378 [10:16:19<14:49:51, 12.18s/it] + 41%|████ | 2994/7378 [10:16:32<14:59:33, 12.31s/it] + +{'loss': 0.4168, 'learning_rate': 1.3465083131933573e-05, 'epoch': 0.41} + + 41%|████ | 2994/7378 [10:16:32<14:59:33, 12.31s/it] + 41%|████ | 2995/7378 [10:16:44<15:00:02, 12.32s/it] + +{'loss': 0.4751, 'learning_rate': 1.346096462982031e-05, 'epoch': 0.41} + + 41%|████ | 2995/7378 [10:16:44<15:00:02, 12.32s/it] + 41%|████ | 2996/7378 [10:16:57<15:01:34, 12.34s/it] + +{'loss': 0.482, 'learning_rate': 1.3456845460660355e-05, 'epoch': 0.41} + + 41%|████ | 2996/7378 [10:16:57<15:01:34, 12.34s/it] + 41%|████ | 2997/7378 [10:17:09<15:06:29, 12.41s/it] + +{'loss': 0.5237, 'learning_rate': 1.3452725625247612e-05, 'epoch': 0.41} + + 41%|████ | 2997/7378 [10:17:09<15:06:29, 12.41s/it] + 41%|████ | 2998/7378 [10:17:21<15:03:42, 12.38s/it] + +{'loss': 0.5609, 'learning_rate': 1.3448605124376111e-05, 'epoch': 0.41} + + 41%|████ | 2998/7378 [10:17:21<15:03:42, 12.38s/it] + 41%|████ | 2999/7378 [10:17:34<15:05:24, 12.41s/it] + +{'loss': 0.4787, 'learning_rate': 1.344448395884002e-05, 'epoch': 0.41} + + 41%|████ | 2999/7378 [10:17:34<15:05:24, 12.41s/it] + 41%|████ | 3000/7378 [10:17:46<15:05:40, 12.41s/it] + +{'loss': 0.4637, 'learning_rate': 1.3440362129433626e-05, 'epoch': 0.41} + + 41%|████ | 3000/7378 [10:17:46<15:05:40, 12.41s/it] + 41%|████ | 3001/7378 [10:17:59<15:13:04, 12.52s/it] + +{'loss': 0.4645, 'learning_rate': 1.3436239636951351e-05, 'epoch': 0.41} + + 41%|████ | 3001/7378 [10:17:59<15:13:04, 12.52s/it] + 41%|████ | 3002/7378 [10:18:11<15:02:28, 12.37s/it] + +{'loss': 0.4616, 'learning_rate': 1.3432116482187734e-05, 'epoch': 0.41} + + 41%|████ | 3002/7378 [10:18:11<15:02:28, 12.37s/it] + 41%|████ | 3003/7378 [10:18:23<14:57:25, 12.31s/it] + +{'loss': 0.5139, 'learning_rate': 1.3427992665937455e-05, 'epoch': 0.41} + + 41%|████ | 3003/7378 [10:18:23<14:57:25, 12.31s/it] + 41%|████ | 3004/7378 [10:18:36<15:02:22, 12.38s/it] + +{'loss': 0.4571, 'learning_rate': 1.3423868188995308e-05, 'epoch': 0.41} + + 41%|████ | 3004/7378 [10:18:36<15:02:22, 12.38s/it] + 41%|████ | 3005/7378 [10:18:48<15:00:49, 12.36s/it] + +{'loss': 0.4519, 'learning_rate': 1.3419743052156229e-05, 'epoch': 0.41} + + 41%|████ | 3005/7378 [10:18:48<15:00:49, 12.36s/it] + 41%|████ | 3006/7378 [10:19:00<14:58:18, 12.33s/it] + +{'loss': 0.4705, 'learning_rate': 1.341561725621527e-05, 'epoch': 0.41} + + 41%|████ | 3006/7378 [10:19:00<14:58:18, 12.33s/it] + 41%|████ | 3007/7378 [10:19:13<15:02:05, 12.38s/it] + +{'loss': 0.5024, 'learning_rate': 1.3411490801967611e-05, 'epoch': 0.41} + + 41%|████ | 3007/7378 [10:19:13<15:02:05, 12.38s/it] + 41%|████ | 3008/7378 [10:19:25<14:57:43, 12.33s/it] + +{'loss': 0.4425, 'learning_rate': 1.3407363690208567e-05, 'epoch': 0.41} + + 41%|████ | 3008/7378 [10:19:25<14:57:43, 12.33s/it] + 41%|████ | 3009/7378 [10:19:37<14:56:32, 12.31s/it] + +{'loss': 0.4511, 'learning_rate': 1.3403235921733569e-05, 'epoch': 0.41} + + 41%|████ | 3009/7378 [10:19:37<14:56:32, 12.31s/it] + 41%|████ | 3010/7378 [10:19:50<14:53:26, 12.27s/it] + +{'loss': 0.4557, 'learning_rate': 1.339910749733818e-05, 'epoch': 0.41} + + 41%|████ | 3010/7378 [10:19:50<14:53:26, 12.27s/it] + 41%|████ | 3011/7378 [10:20:02<14:51:52, 12.25s/it] + +{'loss': 0.4449, 'learning_rate': 1.3394978417818095e-05, 'epoch': 0.41} + + 41%|████ | 3011/7378 [10:20:02<14:51:52, 12.25s/it] + 41%|████ | 3012/7378 [10:20:14<14:51:32, 12.25s/it] + +{'loss': 0.5148, 'learning_rate': 1.339084868396912e-05, 'epoch': 0.41} + + 41%|████ | 3012/7378 [10:20:14<14:51:32, 12.25s/it] + 41%|████ | 3013/7378 [10:20:27<15:00:27, 12.38s/it] + +{'loss': 0.4393, 'learning_rate': 1.3386718296587205e-05, 'epoch': 0.41} + + 41%|████ | 3013/7378 [10:20:27<15:00:27, 12.38s/it] + 41%|████ | 3014/7378 [10:20:39<14:53:16, 12.28s/it] + +{'loss': 0.484, 'learning_rate': 1.338258725646841e-05, 'epoch': 0.41} + + 41%|████ | 3014/7378 [10:20:39<14:53:16, 12.28s/it] + 41%|████ | 3015/7378 [10:20:51<14:43:57, 12.16s/it] + +{'loss': 0.5877, 'learning_rate': 1.3378455564408937e-05, 'epoch': 0.41} + + 41%|████ | 3015/7378 [10:20:51<14:43:57, 12.16s/it] + 41%|████ | 3016/7378 [10:21:03<14:53:40, 12.29s/it] + +{'loss': 0.5221, 'learning_rate': 1.3374323221205097e-05, 'epoch': 0.41} + + 41%|████ | 3016/7378 [10:21:03<14:53:40, 12.29s/it] + 41%|████ | 3017/7378 [10:21:16<14:56:22, 12.33s/it] + +{'loss': 0.4916, 'learning_rate': 1.3370190227653339e-05, 'epoch': 0.41} + + 41%|████ | 3017/7378 [10:21:16<14:56:22, 12.33s/it] + 41%|████ | 3018/7378 [10:21:28<14:54:26, 12.31s/it] + +{'loss': 0.4657, 'learning_rate': 1.336605658455023e-05, 'epoch': 0.41} + + 41%|████ | 3018/7378 [10:21:28<14:54:26, 12.31s/it] + 41%|████ | 3019/7378 [10:21:40<14:51:25, 12.27s/it] + +{'loss': 0.4136, 'learning_rate': 1.3361922292692469e-05, 'epoch': 0.41} + + 41%|████ | 3019/7378 [10:21:40<14:51:25, 12.27s/it] + 41%|████ | 3020/7378 [10:21:52<14:54:39, 12.32s/it] + +{'loss': 0.4933, 'learning_rate': 1.3357787352876872e-05, 'epoch': 0.41} + + 41%|████ | 3020/7378 [10:21:52<14:54:39, 12.32s/it] + 41%|████ | 3021/7378 [10:22:05<14:57:41, 12.36s/it] + +{'loss': 0.4492, 'learning_rate': 1.3353651765900382e-05, 'epoch': 0.41} + + 41%|████ | 3021/7378 [10:22:05<14:57:41, 12.36s/it] + 41%|████ | 3022/7378 [10:22:17<14:52:52, 12.30s/it] + +{'loss': 0.4062, 'learning_rate': 1.3349515532560074e-05, 'epoch': 0.41} + + 41%|████ | 3022/7378 [10:22:17<14:52:52, 12.30s/it] + 41%|████ | 3023/7378 [10:22:29<14:51:01, 12.28s/it] + +{'loss': 0.4322, 'learning_rate': 1.3345378653653137e-05, 'epoch': 0.41} + + 41%|████ | 3023/7378 [10:22:29<14:51:01, 12.28s/it] + 41%|████ | 3024/7378 [10:22:42<14:55:33, 12.34s/it] + +{'loss': 0.5009, 'learning_rate': 1.3341241129976897e-05, 'epoch': 0.41} + + 41%|████ | 3024/7378 [10:22:42<14:55:33, 12.34s/it] + 41%|████ | 3025/7378 [10:22:54<14:58:27, 12.38s/it] + +{'loss': 0.5116, 'learning_rate': 1.3337102962328787e-05, 'epoch': 0.41} + + 41%|████ | 3025/7378 [10:22:54<14:58:27, 12.38s/it] + 41%|████ | 3026/7378 [10:23:07<14:58:34, 12.39s/it] + +{'loss': 0.5042, 'learning_rate': 1.3332964151506382e-05, 'epoch': 0.41} + + 41%|████ | 3026/7378 [10:23:07<14:58:34, 12.39s/it] + 41%|████ | 3027/7378 [10:23:19<14:56:32, 12.36s/it] + +{'loss': 0.4398, 'learning_rate': 1.332882469830737e-05, 'epoch': 0.41} + + 41%|████ | 3027/7378 [10:23:19<14:56:32, 12.36s/it] + 41%|████ | 3028/7378 [10:23:31<14:49:06, 12.26s/it] + +{'loss': 0.4933, 'learning_rate': 1.3324684603529563e-05, 'epoch': 0.41} + + 41%|████ | 3028/7378 [10:23:31<14:49:06, 12.26s/it] + 41%|████ | 3029/7378 [10:23:43<14:42:57, 12.18s/it] + +{'loss': 0.5972, 'learning_rate': 1.3320543867970907e-05, 'epoch': 0.41} + + 41%|████ | 3029/7378 [10:23:43<14:42:57, 12.18s/it] + 41%|████ | 3030/7378 [10:23:56<14:53:46, 12.33s/it] + +{'loss': 0.4992, 'learning_rate': 1.3316402492429454e-05, 'epoch': 0.41} + + 41%|████ | 3030/7378 [10:23:56<14:53:46, 12.33s/it] + 41%|████ | 3031/7378 [10:24:08<14:57:26, 12.39s/it] + +{'loss': 0.4853, 'learning_rate': 1.3312260477703397e-05, 'epoch': 0.41} + + 41%|████ | 3031/7378 [10:24:08<14:57:26, 12.39s/it] + 41%|████ | 3032/7378 [10:24:20<14:49:48, 12.28s/it] + +{'loss': 0.4558, 'learning_rate': 1.3308117824591045e-05, 'epoch': 0.41} + + 41%|████ | 3032/7378 [10:24:20<14:49:48, 12.28s/it] + 41%|████ | 3033/7378 [10:24:33<15:00:26, 12.43s/it] + +{'loss': 0.5391, 'learning_rate': 1.330397453389082e-05, 'epoch': 0.41} + + 41%|████ | 3033/7378 [10:24:33<15:00:26, 12.43s/it] + 41%|████ | 3034/7378 [10:24:45<14:58:38, 12.41s/it] + +{'loss': 0.4369, 'learning_rate': 1.3299830606401285e-05, 'epoch': 0.41} + + 41%|████ | 3034/7378 [10:24:45<14:58:38, 12.41s/it] + 41%|████ | 3035/7378 [10:24:58<14:53:58, 12.35s/it] + +{'loss': 0.4902, 'learning_rate': 1.3295686042921115e-05, 'epoch': 0.41} + + 41%|████ | 3035/7378 [10:24:58<14:53:58, 12.35s/it] + 41%|████ | 3036/7378 [10:25:10<14:49:58, 12.30s/it] + +{'loss': 0.4709, 'learning_rate': 1.3291540844249108e-05, 'epoch': 0.41} + + 41%|████ | 3036/7378 [10:25:10<14:49:58, 12.30s/it] + 41%|████ | 3037/7378 [10:25:22<14:43:51, 12.22s/it] + +{'loss': 0.3694, 'learning_rate': 1.3287395011184188e-05, 'epoch': 0.41} + + 41%|████ | 3037/7378 [10:25:22<14:43:51, 12.22s/it] + 41%|████ | 3038/7378 [10:25:34<14:47:03, 12.26s/it] + +{'loss': 0.5154, 'learning_rate': 1.32832485445254e-05, 'epoch': 0.41} + + 41%|████ | 3038/7378 [10:25:34<14:47:03, 12.26s/it] + 41%|████ | 3039/7378 [10:25:46<14:42:37, 12.20s/it] + +{'loss': 0.3926, 'learning_rate': 1.327910144507191e-05, 'epoch': 0.41} + + 41%|████ | 3039/7378 [10:25:46<14:42:37, 12.20s/it] + 41%|████ | 3040/7378 [10:25:59<14:57:51, 12.42s/it] + +{'loss': 0.5084, 'learning_rate': 1.3274953713623e-05, 'epoch': 0.41} + + 41%|████ | 3040/7378 [10:25:59<14:57:51, 12.42s/it] + 41%|████ | 3041/7378 [10:26:12<15:09:51, 12.59s/it] + +{'loss': 0.474, 'learning_rate': 1.327080535097809e-05, 'epoch': 0.41} + + 41%|████ | 3041/7378 [10:26:12<15:09:51, 12.59s/it] + 41%|████ | 3042/7378 [10:26:24<14:58:57, 12.44s/it] + +{'loss': 0.4305, 'learning_rate': 1.3266656357936705e-05, 'epoch': 0.41} + + 41%|████ | 3042/7378 [10:26:24<14:58:57, 12.44s/it] + 41%|████ | 3043/7378 [10:26:36<14:53:27, 12.37s/it] + +{'loss': 0.4369, 'learning_rate': 1.3262506735298505e-05, 'epoch': 0.41} + + 41%|████ | 3043/7378 [10:26:36<14:53:27, 12.37s/it] + 41%|████▏ | 3044/7378 [10:26:49<14:52:33, 12.36s/it] + +{'loss': 0.4277, 'learning_rate': 1.3258356483863258e-05, 'epoch': 0.41} + + 41%|████▏ | 3044/7378 [10:26:49<14:52:33, 12.36s/it] + 41%|████▏ | 3045/7378 [10:27:01<14:51:14, 12.34s/it] + +{'loss': 0.4424, 'learning_rate': 1.3254205604430862e-05, 'epoch': 0.41} + + 41%|████▏ | 3045/7378 [10:27:01<14:51:14, 12.34s/it] + 41%|████▏ | 3046/7378 [10:27:13<14:52:27, 12.36s/it] + +{'loss': 0.4247, 'learning_rate': 1.3250054097801334e-05, 'epoch': 0.41} + + 41%|████▏ | 3046/7378 [10:27:13<14:52:27, 12.36s/it] + 41%|████▏ | 3047/7378 [10:27:26<15:00:40, 12.48s/it] + +{'loss': 0.4747, 'learning_rate': 1.3245901964774817e-05, 'epoch': 0.41} + + 41%|████▏ | 3047/7378 [10:27:26<15:00:40, 12.48s/it] + 41%|████▏ | 3048/7378 [10:27:39<14:57:10, 12.43s/it] + +{'loss': 0.4423, 'learning_rate': 1.3241749206151561e-05, 'epoch': 0.41} + + 41%|████▏ | 3048/7378 [10:27:39<14:57:10, 12.43s/it] + 41%|████▏ | 3049/7378 [10:27:51<15:03:17, 12.52s/it] + +{'loss': 0.5188, 'learning_rate': 1.323759582273195e-05, 'epoch': 0.41} + + 41%|████▏ | 3049/7378 [10:27:51<15:03:17, 12.52s/it] + 41%|████▏ | 3050/7378 [10:28:04<15:01:13, 12.49s/it] + +{'loss': 0.4756, 'learning_rate': 1.3233441815316486e-05, 'epoch': 0.41} + + 41%|████▏ | 3050/7378 [10:28:04<15:01:13, 12.49s/it] + 41%|████▏ | 3051/7378 [10:28:16<14:55:13, 12.41s/it] + +{'loss': 0.4611, 'learning_rate': 1.3229287184705782e-05, 'epoch': 0.41} + + 41%|████▏ | 3051/7378 [10:28:16<14:55:13, 12.41s/it] + 41%|████▏ | 3052/7378 [10:28:28<14:47:21, 12.31s/it] + +{'loss': 0.4706, 'learning_rate': 1.3225131931700583e-05, 'epoch': 0.41} + + 41%|████▏ | 3052/7378 [10:28:28<14:47:21, 12.31s/it] + 41%|████▏ | 3053/7378 [10:28:40<14:50:50, 12.36s/it] + +{'loss': 0.4477, 'learning_rate': 1.3220976057101749e-05, 'epoch': 0.41} + + 41%|████▏ | 3053/7378 [10:28:40<14:50:50, 12.36s/it] + 41%|████▏ | 3054/7378 [10:28:53<14:55:09, 12.42s/it] + +{'loss': 0.4829, 'learning_rate': 1.3216819561710255e-05, 'epoch': 0.41} + + 41%|████▏ | 3054/7378 [10:28:53<14:55:09, 12.42s/it] + 41%|████▏ | 3055/7378 [10:29:06<14:58:44, 12.47s/it] + +{'loss': 0.4819, 'learning_rate': 1.3212662446327204e-05, 'epoch': 0.41} + + 41%|████▏ | 3055/7378 [10:29:06<14:58:44, 12.47s/it] + 41%|████▏ | 3056/7378 [10:29:18<14:52:41, 12.39s/it] + +{'loss': 0.4628, 'learning_rate': 1.3208504711753815e-05, 'epoch': 0.41} + + 41%|████▏ | 3056/7378 [10:29:18<14:52:41, 12.39s/it] + 41%|████▏ | 3057/7378 [10:29:30<14:53:43, 12.41s/it] + +{'loss': 0.4646, 'learning_rate': 1.3204346358791426e-05, 'epoch': 0.41} + + 41%|████▏ | 3057/7378 [10:29:30<14:53:43, 12.41s/it] + 41%|████▏ | 3058/7378 [10:29:42<14:44:47, 12.29s/it] + +{'loss': 0.4531, 'learning_rate': 1.3200187388241492e-05, 'epoch': 0.41} + + 41%|████▏ | 3058/7378 [10:29:42<14:44:47, 12.29s/it] + 41%|████▏ | 3059/7378 [10:29:54<14:41:17, 12.24s/it] + +{'loss': 0.5143, 'learning_rate': 1.3196027800905596e-05, 'epoch': 0.41} + + 41%|████▏ | 3059/7378 [10:29:54<14:41:17, 12.24s/it] + 41%|████▏ | 3060/7378 [10:30:07<14:38:28, 12.21s/it] + +{'loss': 0.4437, 'learning_rate': 1.3191867597585422e-05, 'epoch': 0.41} + + 41%|████▏ | 3060/7378 [10:30:07<14:38:28, 12.21s/it] + 41%|████▏ | 3061/7378 [10:30:19<14:37:15, 12.19s/it] + +{'loss': 0.4013, 'learning_rate': 1.3187706779082796e-05, 'epoch': 0.41} + + 41%|████▏ | 3061/7378 [10:30:19<14:37:15, 12.19s/it] + 42%|████▏ | 3062/7378 [10:30:31<14:35:59, 12.18s/it] + +{'loss': 0.4333, 'learning_rate': 1.3183545346199641e-05, 'epoch': 0.42} + + 42%|████▏ | 3062/7378 [10:30:31<14:35:59, 12.18s/it] + 42%|████▏ | 3063/7378 [10:30:43<14:38:46, 12.22s/it] + +{'loss': 0.4318, 'learning_rate': 1.3179383299738016e-05, 'epoch': 0.42} + + 42%|████▏ | 3063/7378 [10:30:43<14:38:46, 12.22s/it] + 42%|████▏ | 3064/7378 [10:30:56<14:49:36, 12.37s/it] + +{'loss': 0.4791, 'learning_rate': 1.3175220640500084e-05, 'epoch': 0.42} + + 42%|████▏ | 3064/7378 [10:30:56<14:49:36, 12.37s/it] + 42%|████▏ | 3065/7378 [10:31:08<14:41:02, 12.26s/it] + +{'loss': 0.4067, 'learning_rate': 1.3171057369288134e-05, 'epoch': 0.42} + + 42%|████▏ | 3065/7378 [10:31:08<14:41:02, 12.26s/it] + 42%|████▏ | 3066/7378 [10:31:20<14:40:53, 12.26s/it] + +{'loss': 0.4366, 'learning_rate': 1.3166893486904572e-05, 'epoch': 0.42} + + 42%|████▏ | 3066/7378 [10:31:20<14:40:53, 12.26s/it] + 42%|████▏ | 3067/7378 [10:31:32<14:41:02, 12.26s/it] + +{'loss': 0.5155, 'learning_rate': 1.3162728994151923e-05, 'epoch': 0.42} + + 42%|████▏ | 3067/7378 [10:31:32<14:41:02, 12.26s/it] + 42%|████▏ | 3068/7378 [10:31:45<14:37:49, 12.22s/it] + +{'loss': 0.4203, 'learning_rate': 1.3158563891832825e-05, 'epoch': 0.42} + + 42%|████▏ | 3068/7378 [10:31:45<14:37:49, 12.22s/it] + 42%|████▏ | 3069/7378 [10:31:56<14:30:14, 12.12s/it] + +{'loss': 0.5151, 'learning_rate': 1.3154398180750038e-05, 'epoch': 0.42} + + 42%|████▏ | 3069/7378 [10:31:56<14:30:14, 12.12s/it] + 42%|████▏ | 3070/7378 [10:32:09<14:32:00, 12.14s/it] + +{'loss': 0.4976, 'learning_rate': 1.315023186170643e-05, 'epoch': 0.42} + + 42%|████▏ | 3070/7378 [10:32:09<14:32:00, 12.14s/it] + 42%|████▏ | 3071/7378 [10:32:21<14:38:35, 12.24s/it] + +{'loss': 0.4838, 'learning_rate': 1.3146064935505008e-05, 'epoch': 0.42} + + 42%|████▏ | 3071/7378 [10:32:21<14:38:35, 12.24s/it] + 42%|████▏ | 3072/7378 [10:32:34<14:42:45, 12.30s/it] + +{'loss': 0.4838, 'learning_rate': 1.314189740294887e-05, 'epoch': 0.42} + + 42%|████▏ | 3072/7378 [10:32:34<14:42:45, 12.30s/it] + 42%|████▏ | 3073/7378 [10:32:46<14:46:21, 12.35s/it] + +{'loss': 0.4494, 'learning_rate': 1.3137729264841248e-05, 'epoch': 0.42} + + 42%|████▏ | 3073/7378 [10:32:46<14:46:21, 12.35s/it] + 42%|████▏ | 3074/7378 [10:32:58<14:44:13, 12.33s/it] + +{'loss': 0.5101, 'learning_rate': 1.3133560521985485e-05, 'epoch': 0.42} + + 42%|████▏ | 3074/7378 [10:32:58<14:44:13, 12.33s/it] + 42%|████▏ | 3075/7378 [10:33:11<14:48:45, 12.39s/it] + +{'loss': 0.4031, 'learning_rate': 1.3129391175185035e-05, 'epoch': 0.42} + + 42%|████▏ | 3075/7378 [10:33:11<14:48:45, 12.39s/it] + 42%|████▏ | 3076/7378 [10:33:23<14:46:27, 12.36s/it] + +{'loss': 0.4926, 'learning_rate': 1.3125221225243483e-05, 'epoch': 0.42} + + 42%|████▏ | 3076/7378 [10:33:23<14:46:27, 12.36s/it] + 42%|████▏ | 3077/7378 [10:33:36<14:53:05, 12.46s/it] + +{'loss': 0.4946, 'learning_rate': 1.3121050672964514e-05, 'epoch': 0.42} + + 42%|████▏ | 3077/7378 [10:33:36<14:53:05, 12.46s/it] + 42%|████▏ | 3078/7378 [10:33:48<14:46:38, 12.37s/it] + +{'loss': 0.4182, 'learning_rate': 1.3116879519151944e-05, 'epoch': 0.42} + + 42%|████▏ | 3078/7378 [10:33:48<14:46:38, 12.37s/it] + 42%|████▏ | 3079/7378 [10:34:00<14:47:56, 12.39s/it] + +{'loss': 0.4699, 'learning_rate': 1.3112707764609689e-05, 'epoch': 0.42} + + 42%|████▏ | 3079/7378 [10:34:00<14:47:56, 12.39s/it] + 42%|████▏ | 3080/7378 [10:34:13<14:44:07, 12.34s/it] + +{'loss': 0.4998, 'learning_rate': 1.3108535410141795e-05, 'epoch': 0.42} + + 42%|████▏ | 3080/7378 [10:34:13<14:44:07, 12.34s/it] + 42%|████▏ | 3081/7378 [10:34:25<14:42:05, 12.32s/it] + +{'loss': 0.421, 'learning_rate': 1.3104362456552418e-05, 'epoch': 0.42} + + 42%|████▏ | 3081/7378 [10:34:25<14:42:05, 12.32s/it] + 42%|████▏ | 3082/7378 [10:34:37<14:37:27, 12.25s/it] + +{'loss': 0.5162, 'learning_rate': 1.310018890464583e-05, 'epoch': 0.42} + + 42%|████▏ | 3082/7378 [10:34:37<14:37:27, 12.25s/it] + 42%|████▏ | 3083/7378 [10:34:49<14:39:51, 12.29s/it] + +{'loss': 0.4894, 'learning_rate': 1.3096014755226414e-05, 'epoch': 0.42} + + 42%|████▏ | 3083/7378 [10:34:49<14:39:51, 12.29s/it] + 42%|████�� | 3084/7378 [10:35:02<14:39:11, 12.28s/it] + +{'loss': 0.5093, 'learning_rate': 1.3091840009098674e-05, 'epoch': 0.42} + + 42%|████▏ | 3084/7378 [10:35:02<14:39:11, 12.28s/it] + 42%|████▏ | 3085/7378 [10:35:14<14:40:54, 12.31s/it] + +{'loss': 0.5155, 'learning_rate': 1.3087664667067226e-05, 'epoch': 0.42} + + 42%|████▏ | 3085/7378 [10:35:14<14:40:54, 12.31s/it] + 42%|████▏ | 3086/7378 [10:35:26<14:43:33, 12.35s/it] + +{'loss': 0.4998, 'learning_rate': 1.3083488729936802e-05, 'epoch': 0.42} + + 42%|████▏ | 3086/7378 [10:35:26<14:43:33, 12.35s/it] + 42%|████▏ | 3087/7378 [10:35:39<14:48:15, 12.42s/it] + +{'loss': 0.476, 'learning_rate': 1.3079312198512249e-05, 'epoch': 0.42} + + 42%|████▏ | 3087/7378 [10:35:39<14:48:15, 12.42s/it] + 42%|████▏ | 3088/7378 [10:35:52<14:53:49, 12.50s/it] + +{'loss': 0.4445, 'learning_rate': 1.3075135073598525e-05, 'epoch': 0.42} + + 42%|████▏ | 3088/7378 [10:35:52<14:53:49, 12.50s/it] + 42%|████▏ | 3089/7378 [10:36:04<14:39:23, 12.30s/it] + +{'loss': 0.4145, 'learning_rate': 1.3070957356000716e-05, 'epoch': 0.42} + + 42%|████▏ | 3089/7378 [10:36:04<14:39:23, 12.30s/it] + 42%|████▏ | 3090/7378 [10:36:16<14:37:07, 12.27s/it] + +{'loss': 0.4573, 'learning_rate': 1.3066779046523997e-05, 'epoch': 0.42} + + 42%|████▏ | 3090/7378 [10:36:16<14:37:07, 12.27s/it] + 42%|████▏ | 3091/7378 [10:36:28<14:36:08, 12.26s/it] + +{'loss': 0.4129, 'learning_rate': 1.3062600145973678e-05, 'epoch': 0.42} + + 42%|████▏ | 3091/7378 [10:36:28<14:36:08, 12.26s/it] + 42%|████▏ | 3092/7378 [10:36:40<14:34:03, 12.24s/it] + +{'loss': 0.4743, 'learning_rate': 1.305842065515518e-05, 'epoch': 0.42} + + 42%|████▏ | 3092/7378 [10:36:40<14:34:03, 12.24s/it] + 42%|████▏ | 3093/7378 [10:36:53<14:37:29, 12.29s/it] + +{'loss': 0.3736, 'learning_rate': 1.3054240574874028e-05, 'epoch': 0.42} + + 42%|████▏ | 3093/7378 [10:36:53<14:37:29, 12.29s/it] + 42%|████▏ | 3094/7378 [10:37:05<14:40:54, 12.34s/it] + +{'loss': 0.4658, 'learning_rate': 1.3050059905935876e-05, 'epoch': 0.42} + + 42%|████▏ | 3094/7378 [10:37:05<14:40:54, 12.34s/it] + 42%|████▏ | 3095/7378 [10:37:18<14:47:42, 12.44s/it] + +{'loss': 0.3967, 'learning_rate': 1.3045878649146476e-05, 'epoch': 0.42} + + 42%|████▏ | 3095/7378 [10:37:18<14:47:42, 12.44s/it] + 42%|████▏ | 3096/7378 [10:37:30<14:52:30, 12.51s/it] + +{'loss': 0.5126, 'learning_rate': 1.3041696805311697e-05, 'epoch': 0.42} + + 42%|████▏ | 3096/7378 [10:37:30<14:52:30, 12.51s/it] + 42%|████▏ | 3097/7378 [10:37:43<14:51:05, 12.49s/it] + +{'loss': 0.4901, 'learning_rate': 1.3037514375237527e-05, 'epoch': 0.42} + + 42%|████▏ | 3097/7378 [10:37:43<14:51:05, 12.49s/it] + 42%|████▏ | 3098/7378 [10:37:55<14:43:54, 12.39s/it] + +{'loss': 0.4369, 'learning_rate': 1.3033331359730065e-05, 'epoch': 0.42} + + 42%|████▏ | 3098/7378 [10:37:55<14:43:54, 12.39s/it] + 42%|████▏ | 3099/7378 [10:38:07<14:34:28, 12.26s/it] + +{'loss': 0.5569, 'learning_rate': 1.3029147759595522e-05, 'epoch': 0.42} + + 42%|████▏ | 3099/7378 [10:38:07<14:34:28, 12.26s/it] + 42%|████▏ | 3100/7378 [10:38:19<14:36:40, 12.30s/it] + +{'loss': 0.5057, 'learning_rate': 1.302496357564022e-05, 'epoch': 0.42} + + 42%|████▏ | 3100/7378 [10:38:19<14:36:40, 12.30s/it] + 42%|████▏ | 3101/7378 [10:38:32<14:41:44, 12.37s/it] + +{'loss': 0.4293, 'learning_rate': 1.3020778808670595e-05, 'epoch': 0.42} + + 42%|████▏ | 3101/7378 [10:38:32<14:41:44, 12.37s/it] + 42%|████▏ | 3102/7378 [10:38:44<14:41:42, 12.37s/it] + +{'loss': 0.4149, 'learning_rate': 1.3016593459493194e-05, 'epoch': 0.42} + + 42%|████▏ | 3102/7378 [10:38:44<14:41:42, 12.37s/it] + 42%|████▏ | 3103/7378 [10:38:57<14:41:03, 12.37s/it] + +{'loss': 0.4407, 'learning_rate': 1.301240752891468e-05, 'epoch': 0.42} + + 42%|████▏ | 3103/7378 [10:38:57<14:41:03, 12.37s/it] + 42%|████▏ | 3104/7378 [10:39:09<14:45:00, 12.42s/it] + +{'loss': 0.4673, 'learning_rate': 1.3008221017741826e-05, 'epoch': 0.42} + + 42%|████▏ | 3104/7378 [10:39:09<14:45:00, 12.42s/it] + 42%|████▏ | 3105/7378 [10:39:21<14:40:48, 12.37s/it] + +{'loss': 0.5194, 'learning_rate': 1.3004033926781512e-05, 'epoch': 0.42} + + 42%|████▏ | 3105/7378 [10:39:21<14:40:48, 12.37s/it] + 42%|████▏ | 3106/7378 [10:39:34<14:38:33, 12.34s/it] + +{'loss': 0.4869, 'learning_rate': 1.299984625684074e-05, 'epoch': 0.42} + + 42%|████▏ | 3106/7378 [10:39:34<14:38:33, 12.34s/it] + 42%|████▏ | 3107/7378 [10:39:46<14:38:43, 12.34s/it] + +{'loss': 0.4739, 'learning_rate': 1.2995658008726611e-05, 'epoch': 0.42} + + 42%|████▏ | 3107/7378 [10:39:46<14:38:43, 12.34s/it] + 42%|████▏ | 3108/7378 [10:39:58<14:41:05, 12.38s/it] + +{'loss': 0.4915, 'learning_rate': 1.299146918324635e-05, 'epoch': 0.42} + + 42%|████▏ | 3108/7378 [10:39:59<14:41:05, 12.38s/it] + 42%|████▏ | 3109/7378 [10:40:11<14:45:46, 12.45s/it] + +{'loss': 0.4056, 'learning_rate': 1.2987279781207285e-05, 'epoch': 0.42} + + 42%|████▏ | 3109/7378 [10:40:11<14:45:46, 12.45s/it] + 42%|████▏ | 3110/7378 [10:40:23<14:35:58, 12.31s/it] + +{'loss': 0.4143, 'learning_rate': 1.2983089803416857e-05, 'epoch': 0.42} + + 42%|████▏ | 3110/7378 [10:40:23<14:35:58, 12.31s/it] + 42%|████▏ | 3111/7378 [10:40:35<14:33:03, 12.28s/it] + +{'loss': 0.4492, 'learning_rate': 1.2978899250682619e-05, 'epoch': 0.42} + + 42%|████▏ | 3111/7378 [10:40:35<14:33:03, 12.28s/it] + 42%|████▏ | 3112/7378 [10:40:48<14:32:23, 12.27s/it] + +{'loss': 0.4264, 'learning_rate': 1.2974708123812239e-05, 'epoch': 0.42} + + 42%|████▏ | 3112/7378 [10:40:48<14:32:23, 12.27s/it] + 42%|████▏ | 3113/7378 [10:41:00<14:39:18, 12.37s/it] + +{'loss': 0.4448, 'learning_rate': 1.2970516423613482e-05, 'epoch': 0.42} + + 42%|████▏ | 3113/7378 [10:41:00<14:39:18, 12.37s/it] + 42%|████▏ | 3114/7378 [10:41:13<14:45:36, 12.46s/it] + +{'loss': 0.4911, 'learning_rate': 1.2966324150894238e-05, 'epoch': 0.42} + + 42%|████▏ | 3114/7378 [10:41:13<14:45:36, 12.46s/it] + 42%|████▏ | 3115/7378 [10:41:25<14:45:28, 12.46s/it] + +{'loss': 0.5129, 'learning_rate': 1.2962131306462504e-05, 'epoch': 0.42} + + 42%|████▏ | 3115/7378 [10:41:25<14:45:28, 12.46s/it] + 42%|████▏ | 3116/7378 [10:41:38<14:40:09, 12.39s/it] + +{'loss': 0.4386, 'learning_rate': 1.295793789112638e-05, 'epoch': 0.42} + + 42%|████▏ | 3116/7378 [10:41:38<14:40:09, 12.39s/it] + 42%|████▏ | 3117/7378 [10:41:50<14:37:00, 12.35s/it] + +{'loss': 0.4614, 'learning_rate': 1.2953743905694086e-05, 'epoch': 0.42} + + 42%|████▏ | 3117/7378 [10:41:50<14:37:00, 12.35s/it] + 42%|████▏ | 3118/7378 [10:42:02<14:31:27, 12.27s/it] + +{'loss': 0.5111, 'learning_rate': 1.2949549350973942e-05, 'epoch': 0.42} + + 42%|████▏ | 3118/7378 [10:42:02<14:31:27, 12.27s/it] + 42%|████▏ | 3119/7378 [10:42:14<14:29:32, 12.25s/it] + +{'loss': 0.4999, 'learning_rate': 1.2945354227774385e-05, 'epoch': 0.42} + + 42%|████▏ | 3119/7378 [10:42:14<14:29:32, 12.25s/it] + 42%|████▏ | 3120/7378 [10:42:26<14:33:20, 12.31s/it] + +{'loss': 0.471, 'learning_rate': 1.2941158536903959e-05, 'epoch': 0.42} + + 42%|████▏ | 3120/7378 [10:42:27<14:33:20, 12.31s/it] + 42%|████▏ | 3121/7378 [10:42:38<14:21:47, 12.15s/it] + +{'loss': 0.4395, 'learning_rate': 1.2936962279171318e-05, 'epoch': 0.42} + + 42%|████▏ | 3121/7378 [10:42:38<14:21:47, 12.15s/it] + 42%|████▏ | 3122/7378 [10:42:51<14:26:51, 12.22s/it] + +{'loss': 0.5672, 'learning_rate': 1.2932765455385228e-05, 'epoch': 0.42} + + 42%|████▏ | 3122/7378 [10:42:51<14:26:51, 12.22s/it] + 42%|████▏ | 3123/7378 [10:43:03<14:26:38, 12.22s/it] + +{'loss': 0.4805, 'learning_rate': 1.2928568066354555e-05, 'epoch': 0.42} + + 42%|████▏ | 3123/7378 [10:43:03<14:26:38, 12.22s/it] + 42%|████▏ | 3124/7378 [10:43:15<14:29:27, 12.26s/it] + +{'loss': 0.4091, 'learning_rate': 1.2924370112888283e-05, 'epoch': 0.42} + + 42%|████▏ | 3124/7378 [10:43:15<14:29:27, 12.26s/it] + 42%|████▏ | 3125/7378 [10:43:27<14:26:13, 12.22s/it] + +{'loss': 0.4871, 'learning_rate': 1.2920171595795504e-05, 'epoch': 0.42} + + 42%|████▏ | 3125/7378 [10:43:27<14:26:13, 12.22s/it] + 42%|████▏ | 3126/7378 [10:43:39<14:21:39, 12.16s/it] + +{'loss': 0.5086, 'learning_rate': 1.2915972515885411e-05, 'epoch': 0.42} + + 42%|████▏ | 3126/7378 [10:43:39<14:21:39, 12.16s/it] + 42%|████▏ | 3127/7378 [10:43:51<14:18:16, 12.11s/it] + +{'loss': 0.4077, 'learning_rate': 1.2911772873967317e-05, 'epoch': 0.42} + + 42%|████▏ | 3127/7378 [10:43:51<14:18:16, 12.11s/it] + 42%|████▏ | 3128/7378 [10:44:04<14:17:59, 12.11s/it] + +{'loss': 0.4187, 'learning_rate': 1.2907572670850628e-05, 'epoch': 0.42} + + 42%|████▏ | 3128/7378 [10:44:04<14:17:59, 12.11s/it] + 42%|████▏ | 3129/7378 [10:44:15<14:13:01, 12.05s/it] + +{'loss': 0.473, 'learning_rate': 1.290337190734488e-05, 'epoch': 0.42} + + 42%|████▏ | 3129/7378 [10:44:15<14:13:01, 12.05s/it] + 42%|████▏ | 3130/7378 [10:44:28<14:21:09, 12.16s/it] + +{'loss': 0.4868, 'learning_rate': 1.2899170584259693e-05, 'epoch': 0.42} + + 42%|████▏ | 3130/7378 [10:44:28<14:21:09, 12.16s/it] + 42%|████▏ | 3131/7378 [10:44:41<14:34:51, 12.36s/it] + +{'loss': 0.4535, 'learning_rate': 1.2894968702404813e-05, 'epoch': 0.42} + + 42%|████▏ | 3131/7378 [10:44:41<14:34:51, 12.36s/it] + 42%|████▏ | 3132/7378 [10:44:53<14:25:55, 12.24s/it] + +{'loss': 0.4313, 'learning_rate': 1.2890766262590082e-05, 'epoch': 0.42} + + 42%|████▏ | 3132/7378 [10:44:53<14:25:55, 12.24s/it] + 42%|████▏ | 3133/7378 [10:45:05<14:25:09, 12.23s/it] + +{'loss': 0.4669, 'learning_rate': 1.288656326562546e-05, 'epoch': 0.42} + + 42%|████▏ | 3133/7378 [10:45:05<14:25:09, 12.23s/it] + 42%|████▏ | 3134/7378 [10:45:17<14:24:10, 12.22s/it] + +{'loss': 0.4918, 'learning_rate': 1.2882359712321007e-05, 'epoch': 0.42} + + 42%|████▏ | 3134/7378 [10:45:17<14:24:10, 12.22s/it] + 42%|████▏ | 3135/7378 [10:45:29<14:18:05, 12.13s/it] + +{'loss': 0.4403, 'learning_rate': 1.2878155603486885e-05, 'epoch': 0.42} + + 42%|████▏ | 3135/7378 [10:45:29<14:18:05, 12.13s/it] + 43%|████▎ | 3136/7378 [10:45:41<14:26:33, 12.26s/it] + +{'loss': 0.4514, 'learning_rate': 1.2873950939933382e-05, 'epoch': 0.43} + + 43%|████▎ | 3136/7378 [10:45:41<14:26:33, 12.26s/it] + 43%|████▎ | 3137/7378 [10:45:54<14:26:49, 12.26s/it] + +{'loss': 0.4795, 'learning_rate': 1.2869745722470872e-05, 'epoch': 0.43} + + 43%|████▎ | 3137/7378 [10:45:54<14:26:49, 12.26s/it] + 43%|████▎ | 3138/7378 [10:46:06<14:23:32, 12.22s/it] + +{'loss': 0.4351, 'learning_rate': 1.2865539951909849e-05, 'epoch': 0.43} + + 43%|████▎ | 3138/7378 [10:46:06<14:23:32, 12.22s/it] + 43%|████▎ | 3139/7378 [10:46:18<14:19:58, 12.17s/it] + +{'loss': 0.476, 'learning_rate': 1.2861333629060911e-05, 'epoch': 0.43} + + 43%|████▎ | 3139/7378 [10:46:18<14:19:58, 12.17s/it] + 43%|████▎ | 3140/7378 [10:46:30<14:27:15, 12.28s/it] + +{'loss': 0.4826, 'learning_rate': 1.2857126754734752e-05, 'epoch': 0.43} + + 43%|████▎ | 3140/7378 [10:46:30<14:27:15, 12.28s/it] + 43%|████▎ | 3141/7378 [10:46:43<14:22:30, 12.21s/it] + +{'loss': 0.4455, 'learning_rate': 1.285291932974219e-05, 'epoch': 0.43} + + 43%|████▎ | 3141/7378 [10:46:43<14:22:30, 12.21s/it] + 43%|████▎ | 3142/7378 [10:46:55<14:28:02, 12.30s/it] + +{'loss': 0.4773, 'learning_rate': 1.2848711354894136e-05, 'epoch': 0.43} + + 43%|████▎ | 3142/7378 [10:46:55<14:28:02, 12.30s/it] + 43%|████▎ | 3143/7378 [10:47:07<14:25:13, 12.26s/it] + +{'loss': 0.4528, 'learning_rate': 1.2844502831001615e-05, 'epoch': 0.43} + + 43%|████▎ | 3143/7378 [10:47:07<14:25:13, 12.26s/it] + 43%|████▎ | 3144/7378 [10:47:19<14:23:37, 12.24s/it] + +{'loss': 0.4497, 'learning_rate': 1.2840293758875751e-05, 'epoch': 0.43} + + 43%|████▎ | 3144/7378 [10:47:19<14:23:37, 12.24s/it] + 43%|████▎ | 3145/7378 [10:47:32<14:30:13, 12.33s/it] + +{'loss': 0.4849, 'learning_rate': 1.2836084139327775e-05, 'epoch': 0.43} + + 43%|████▎ | 3145/7378 [10:47:32<14:30:13, 12.33s/it] + 43%|████▎ | 3146/7378 [10:47:44<14:26:20, 12.28s/it] + +{'loss': 0.4609, 'learning_rate': 1.2831873973169029e-05, 'epoch': 0.43} + + 43%|████▎ | 3146/7378 [10:47:44<14:26:20, 12.28s/it] + 43%|████▎ | 3147/7378 [10:47:56<14:28:32, 12.32s/it] + +{'loss': 0.4717, 'learning_rate': 1.2827663261210956e-05, 'epoch': 0.43} + + 43%|████▎ | 3147/7378 [10:47:57<14:28:32, 12.32s/it] + 43%|████▎ | 3148/7378 [10:48:09<14:24:58, 12.27s/it] + +{'loss': 0.4564, 'learning_rate': 1.2823452004265103e-05, 'epoch': 0.43} + + 43%|████▎ | 3148/7378 [10:48:09<14:24:58, 12.27s/it] + 43%|████▎ | 3149/7378 [10:48:21<14:20:58, 12.22s/it] + +{'loss': 0.396, 'learning_rate': 1.2819240203143123e-05, 'epoch': 0.43} + + 43%|████▎ | 3149/7378 [10:48:21<14:20:58, 12.22s/it] + 43%|████▎ | 3150/7378 [10:48:33<14:20:32, 12.21s/it] + +{'loss': 0.4566, 'learning_rate': 1.2815027858656776e-05, 'epoch': 0.43} + + 43%|████▎ | 3150/7378 [10:48:33<14:20:32, 12.21s/it] + 43%|████▎ | 3151/7378 [10:48:45<14:23:14, 12.25s/it] + +{'loss': 0.409, 'learning_rate': 1.2810814971617927e-05, 'epoch': 0.43} + + 43%|████▎ | 3151/7378 [10:48:45<14:23:14, 12.25s/it] + 43%|████▎ | 3152/7378 [10:48:58<14:24:58, 12.28s/it] + +{'loss': 0.4791, 'learning_rate': 1.2806601542838543e-05, 'epoch': 0.43} + + 43%|████▎ | 3152/7378 [10:48:58<14:24:58, 12.28s/it] + 43%|████▎ | 3153/7378 [10:49:10<14:24:00, 12.27s/it] + +{'loss': 0.5133, 'learning_rate': 1.2802387573130692e-05, 'epoch': 0.43} + + 43%|████▎ | 3153/7378 [10:49:10<14:24:00, 12.27s/it] + 43%|████▎ | 3154/7378 [10:49:22<14:24:52, 12.29s/it] + +{'loss': 0.4398, 'learning_rate': 1.279817306330656e-05, 'epoch': 0.43} + + 43%|████▎ | 3154/7378 [10:49:22<14:24:52, 12.29s/it] + 43%|████▎ | 3155/7378 [10:49:34<14:23:39, 12.27s/it] + +{'loss': 0.5019, 'learning_rate': 1.279395801417842e-05, 'epoch': 0.43} + + 43%|████▎ | 3155/7378 [10:49:34<14:23:39, 12.27s/it] + 43%|████▎ | 3156/7378 [10:49:47<14:19:17, 12.21s/it] + +{'loss': 0.5116, 'learning_rate': 1.2789742426558656e-05, 'epoch': 0.43} + + 43%|████▎ | 3156/7378 [10:49:47<14:19:17, 12.21s/it] + 43%|████▎ | 3157/7378 [10:49:59<14:15:32, 12.16s/it] + +{'loss': 0.4368, 'learning_rate': 1.278552630125976e-05, 'epoch': 0.43} + + 43%|████▎ | 3157/7378 [10:49:59<14:15:32, 12.16s/it] + 43%|████▎ | 3158/7378 [10:50:11<14:24:06, 12.29s/it] + +{'loss': 0.4486, 'learning_rate': 1.2781309639094323e-05, 'epoch': 0.43} + + 43%|████▎ | 3158/7378 [10:50:11<14:24:06, 12.29s/it] + 43%|████▎ | 3159/7378 [10:50:24<14:29:44, 12.37s/it] + +{'loss': 0.4773, 'learning_rate': 1.2777092440875045e-05, 'epoch': 0.43} + + 43%|████▎ | 3159/7378 [10:50:24<14:29:44, 12.37s/it] + 43%|████▎ | 3160/7378 [10:50:36<14:37:45, 12.49s/it] + +{'loss': 0.4371, 'learning_rate': 1.277287470741472e-05, 'epoch': 0.43} + + 43%|████▎ | 3160/7378 [10:50:36<14:37:45, 12.49s/it] + 43%|████▎ | 3161/7378 [10:50:49<14:36:53, 12.48s/it] + +{'loss': 0.4835, 'learning_rate': 1.2768656439526248e-05, 'epoch': 0.43} + + 43%|████▎ | 3161/7378 [10:50:49<14:36:53, 12.48s/it] + 43%|████▎ | 3162/7378 [10:51:01<14:33:53, 12.44s/it] + +{'loss': 0.4902, 'learning_rate': 1.2764437638022638e-05, 'epoch': 0.43} + + 43%|████▎ | 3162/7378 [10:51:01<14:33:53, 12.44s/it] + 43%|████▎ | 3163/7378 [10:51:14<14:31:51, 12.41s/it] + +{'loss': 0.4798, 'learning_rate': 1.2760218303716995e-05, 'epoch': 0.43} + + 43%|████▎ | 3163/7378 [10:51:14<14:31:51, 12.41s/it] + 43%|████▎ | 3164/7378 [10:51:26<14:31:21, 12.41s/it] + +{'loss': 0.4811, 'learning_rate': 1.2755998437422536e-05, 'epoch': 0.43} + + 43%|████▎ | 3164/7378 [10:51:26<14:31:21, 12.41s/it] + 43%|████▎ | 3165/7378 [10:51:38<14:32:00, 12.42s/it] + +{'loss': 0.4506, 'learning_rate': 1.2751778039952564e-05, 'epoch': 0.43} + + 43%|████▎ | 3165/7378 [10:51:38<14:32:00, 12.42s/it] + 43%|████▎ | 3166/7378 [10:51:51<14:31:15, 12.41s/it] + +{'loss': 0.4217, 'learning_rate': 1.2747557112120503e-05, 'epoch': 0.43} + + 43%|████▎ | 3166/7378 [10:51:51<14:31:15, 12.41s/it] + 43%|████▎ | 3167/7378 [10:52:04<14:40:04, 12.54s/it] + +{'loss': 0.4571, 'learning_rate': 1.2743335654739866e-05, 'epoch': 0.43} + + 43%|████▎ | 3167/7378 [10:52:04<14:40:04, 12.54s/it] + 43%|████▎ | 3168/7378 [10:52:16<14:33:08, 12.44s/it] + +{'loss': 0.452, 'learning_rate': 1.2739113668624277e-05, 'epoch': 0.43} + + 43%|████▎ | 3168/7378 [10:52:16<14:33:08, 12.44s/it] + 43%|████▎ | 3169/7378 [10:52:28<14:30:17, 12.41s/it] + +{'loss': 0.4695, 'learning_rate': 1.2734891154587454e-05, 'epoch': 0.43} + + 43%|████▎ | 3169/7378 [10:52:28<14:30:17, 12.41s/it] + 43%|████▎ | 3170/7378 [10:52:41<14:32:07, 12.44s/it] + +{'loss': 0.3844, 'learning_rate': 1.2730668113443218e-05, 'epoch': 0.43} + + 43%|████▎ | 3170/7378 [10:52:41<14:32:07, 12.44s/it] + 43%|████▎ | 3171/7378 [10:52:53<14:27:43, 12.38s/it] + +{'loss': 0.4878, 'learning_rate': 1.2726444546005501e-05, 'epoch': 0.43} + + 43%|████▎ | 3171/7378 [10:52:53<14:27:43, 12.38s/it] + 43%|████▎ | 3172/7378 [10:53:05<14:26:33, 12.36s/it] + +{'loss': 0.4761, 'learning_rate': 1.2722220453088323e-05, 'epoch': 0.43} + + 43%|████▎ | 3172/7378 [10:53:05<14:26:33, 12.36s/it] + 43%|████▎ | 3173/7378 [10:53:18<14:31:04, 12.43s/it] + +{'loss': 0.5246, 'learning_rate': 1.2717995835505817e-05, 'epoch': 0.43} + + 43%|████▎ | 3173/7378 [10:53:18<14:31:04, 12.43s/it] + 43%|████▎ | 3174/7378 [10:53:30<14:30:47, 12.43s/it] + +{'loss': 0.4902, 'learning_rate': 1.2713770694072207e-05, 'epoch': 0.43} + + 43%|████▎ | 3174/7378 [10:53:30<14:30:47, 12.43s/it] + 43%|████▎ | 3175/7378 [10:53:43<14:26:53, 12.38s/it] + +{'loss': 0.4546, 'learning_rate': 1.2709545029601827e-05, 'epoch': 0.43} + + 43%|████▎ | 3175/7378 [10:53:43<14:26:53, 12.38s/it] + 43%|████▎ | 3176/7378 [10:53:55<14:26:53, 12.38s/it] + +{'loss': 0.4195, 'learning_rate': 1.2705318842909104e-05, 'epoch': 0.43} + + 43%|████▎ | 3176/7378 [10:53:55<14:26:53, 12.38s/it] + 43%|████▎ | 3177/7378 [10:54:07<14:30:19, 12.43s/it] + +{'loss': 0.4244, 'learning_rate': 1.2701092134808572e-05, 'epoch': 0.43} + + 43%|████▎ | 3177/7378 [10:54:07<14:30:19, 12.43s/it] + 43%|████▎ | 3178/7378 [10:54:20<14:25:45, 12.37s/it] + +{'loss': 0.4052, 'learning_rate': 1.2696864906114863e-05, 'epoch': 0.43} + + 43%|████▎ | 3178/7378 [10:54:20<14:25:45, 12.37s/it] + 43%|████▎ | 3179/7378 [10:54:32<14:21:08, 12.30s/it] + +{'loss': 0.497, 'learning_rate': 1.2692637157642705e-05, 'epoch': 0.43} + + 43%|████▎ | 3179/7378 [10:54:32<14:21:08, 12.30s/it] + 43%|████▎ | 3180/7378 [10:54:44<14:20:08, 12.29s/it] + +{'loss': 0.4593, 'learning_rate': 1.2688408890206934e-05, 'epoch': 0.43} + + 43%|████▎ | 3180/7378 [10:54:44<14:20:08, 12.29s/it] + 43%|████▎ | 3181/7378 [10:54:57<14:22:29, 12.33s/it] + +{'loss': 0.4701, 'learning_rate': 1.2684180104622484e-05, 'epoch': 0.43} + + 43%|████▎ | 3181/7378 [10:54:57<14:22:29, 12.33s/it] + 43%|████▎ | 3182/7378 [10:55:09<14:25:01, 12.37s/it] + +{'loss': 0.473, 'learning_rate': 1.267995080170438e-05, 'epoch': 0.43} + + 43%|████▎ | 3182/7378 [10:55:09<14:25:01, 12.37s/it] + 43%|████▎ | 3183/7378 [10:55:21<14:25:36, 12.38s/it] + +{'loss': 0.4958, 'learning_rate': 1.267572098226776e-05, 'epoch': 0.43} + + 43%|████▎ | 3183/7378 [10:55:21<14:25:36, 12.38s/it] + 43%|████▎ | 3184/7378 [10:55:34<14:25:34, 12.38s/it] + +{'loss': 0.4627, 'learning_rate': 1.2671490647127856e-05, 'epoch': 0.43} + + 43%|████▎ | 3184/7378 [10:55:34<14:25:34, 12.38s/it] + 43%|████▎ | 3185/7378 [10:55:46<14:21:28, 12.33s/it] + +{'loss': 0.4306, 'learning_rate': 1.2667259797099995e-05, 'epoch': 0.43} + + 43%|████▎ | 3185/7378 [10:55:46<14:21:28, 12.33s/it] + 43%|████▎ | 3186/7378 [10:55:58<14:21:17, 12.33s/it] + +{'loss': 0.4648, 'learning_rate': 1.2663028432999606e-05, 'epoch': 0.43} + + 43%|████▎ | 3186/7378 [10:55:58<14:21:17, 12.33s/it] + 43%|████▎ | 3187/7378 [10:56:11<14:17:43, 12.28s/it] + +{'loss': 0.4493, 'learning_rate': 1.2658796555642225e-05, 'epoch': 0.43} + + 43%|████▎ | 3187/7378 [10:56:11<14:17:43, 12.28s/it] + 43%|████▎ | 3188/7378 [10:56:23<14:14:10, 12.23s/it] + +{'loss': 0.4726, 'learning_rate': 1.2654564165843473e-05, 'epoch': 0.43} + + 43%|████▎ | 3188/7378 [10:56:23<14:14:10, 12.23s/it] + 43%|████▎ | 3189/7378 [10:56:35<14:16:39, 12.27s/it] + +{'loss': 0.4892, 'learning_rate': 1.2650331264419083e-05, 'epoch': 0.43} + + 43%|████▎ | 3189/7378 [10:56:35<14:16:39, 12.27s/it] + 43%|████▎ | 3190/7378 [10:56:47<14:14:41, 12.24s/it] + +{'loss': 0.4708, 'learning_rate': 1.2646097852184874e-05, 'epoch': 0.43} + + 43%|████▎ | 3190/7378 [10:56:47<14:14:41, 12.24s/it] + 43%|████▎ | 3191/7378 [10:56:59<14:08:55, 12.17s/it] + +{'loss': 0.4872, 'learning_rate': 1.2641863929956772e-05, 'epoch': 0.43} + + 43%|████▎ | 3191/7378 [10:56:59<14:08:55, 12.17s/it] + 43%|████▎ | 3192/7378 [10:57:12<14:16:35, 12.28s/it] + +{'loss': 0.4466, 'learning_rate': 1.2637629498550803e-05, 'epoch': 0.43} + + 43%|████▎ | 3192/7378 [10:57:12<14:16:35, 12.28s/it] + 43%|████▎ | 3193/7378 [10:57:24<14:15:16, 12.26s/it] + +{'loss': 0.4485, 'learning_rate': 1.263339455878308e-05, 'epoch': 0.43} + + 43%|████▎ | 3193/7378 [10:57:24<14:15:16, 12.26s/it] + 43%|████▎ | 3194/7378 [10:57:36<14:12:12, 12.22s/it] + +{'loss': 0.4556, 'learning_rate': 1.2629159111469831e-05, 'epoch': 0.43} + + 43%|████▎ | 3194/7378 [10:57:36<14:12:12, 12.22s/it] + 43%|████▎ | 3195/7378 [10:57:48<14:14:21, 12.25s/it] + +{'loss': 0.4743, 'learning_rate': 1.2624923157427363e-05, 'epoch': 0.43} + + 43%|████▎ | 3195/7378 [10:57:48<14:14:21, 12.25s/it] + 43%|████▎ | 3196/7378 [10:58:01<14:29:23, 12.47s/it] + +{'loss': 0.4576, 'learning_rate': 1.2620686697472093e-05, 'epoch': 0.43} + + 43%|████▎ | 3196/7378 [10:58:01<14:29:23, 12.47s/it] + 43%|████▎ | 3197/7378 [10:58:13<14:19:14, 12.33s/it] + +{'loss': 0.4604, 'learning_rate': 1.2616449732420532e-05, 'epoch': 0.43} + + 43%|████▎ | 3197/7378 [10:58:13<14:19:14, 12.33s/it] + 43%|████▎ | 3198/7378 [10:58:26<14:33:50, 12.54s/it] + +{'loss': 0.4353, 'learning_rate': 1.2612212263089294e-05, 'epoch': 0.43} + + 43%|████▎ | 3198/7378 [10:58:26<14:33:50, 12.54s/it] + 43%|████▎ | 3199/7378 [10:58:39<14:33:11, 12.54s/it] + +{'loss': 0.4448, 'learning_rate': 1.2607974290295078e-05, 'epoch': 0.43} + + 43%|████▎ | 3199/7378 [10:58:39<14:33:11, 12.54s/it] + 43%|████▎ | 3200/7378 [10:58:51<14:29:42, 12.49s/it] + +{'loss': 0.4633, 'learning_rate': 1.2603735814854687e-05, 'epoch': 0.43} + + 43%|████▎ | 3200/7378 [10:58:51<14:29:42, 12.49s/it] + 43%|████▎ | 3201/7378 [10:59:04<14:29:07, 12.48s/it] + +{'loss': 0.4415, 'learning_rate': 1.259949683758502e-05, 'epoch': 0.43} + + 43%|████▎ | 3201/7378 [10:59:04<14:29:07, 12.48s/it] + 43%|████▎ | 3202/7378 [10:59:16<14:27:16, 12.46s/it] + +{'loss': 0.4708, 'learning_rate': 1.259525735930308e-05, 'epoch': 0.43} + + 43%|████▎ | 3202/7378 [10:59:16<14:27:16, 12.46s/it] + 43%|████▎ | 3203/7378 [10:59:28<14:19:39, 12.35s/it] + +{'loss': 0.5131, 'learning_rate': 1.2591017380825959e-05, 'epoch': 0.43} + + 43%|████▎ | 3203/7378 [10:59:28<14:19:39, 12.35s/it] + 43%|████▎ | 3204/7378 [10:59:41<14:18:51, 12.35s/it] + +{'loss': 0.4562, 'learning_rate': 1.2586776902970841e-05, 'epoch': 0.43} + + 43%|████▎ | 3204/7378 [10:59:41<14:18:51, 12.35s/it] + 43%|████▎ | 3205/7378 [10:59:53<14:18:11, 12.34s/it] + +{'loss': 0.4744, 'learning_rate': 1.258253592655501e-05, 'epoch': 0.43} + + 43%|████▎ | 3205/7378 [10:59:53<14:18:11, 12.34s/it] + 43%|████▎ | 3206/7378 [11:00:05<14:16:31, 12.32s/it] + +{'loss': 0.4329, 'learning_rate': 1.2578294452395858e-05, 'epoch': 0.43} + + 43%|████▎ | 3206/7378 [11:00:05<14:16:31, 12.32s/it] + 43%|████▎ | 3207/7378 [11:00:18<14:23:37, 12.42s/it] + +{'loss': 0.4743, 'learning_rate': 1.2574052481310854e-05, 'epoch': 0.43} + + 43%|████▎ | 3207/7378 [11:00:18<14:23:37, 12.42s/it] + 43%|████▎ | 3208/7378 [11:00:30<14:26:09, 12.46s/it] + +{'loss': 0.4168, 'learning_rate': 1.2569810014117575e-05, 'epoch': 0.43} + + 43%|████▎ | 3208/7378 [11:00:30<14:26:09, 12.46s/it] + 43%|████▎ | 3209/7378 [11:00:43<14:20:09, 12.38s/it] + +{'loss': 0.4791, 'learning_rate': 1.2565567051633685e-05, 'epoch': 0.43} + + 43%|████▎ | 3209/7378 [11:00:43<14:20:09, 12.38s/it] + 44%|████▎ | 3210/7378 [11:00:55<14:18:25, 12.36s/it] + +{'loss': 0.4513, 'learning_rate': 1.2561323594676957e-05, 'epoch': 0.44} + + 44%|████▎ | 3210/7378 [11:00:55<14:18:25, 12.36s/it] + 44%|████▎ | 3211/7378 [11:01:07<14:17:46, 12.35s/it] + +{'loss': 0.5328, 'learning_rate': 1.2557079644065247e-05, 'epoch': 0.44} + + 44%|████▎ | 3211/7378 [11:01:07<14:17:46, 12.35s/it] + 44%|████▎ | 3212/7378 [11:01:20<14:17:10, 12.35s/it] + +{'loss': 0.4423, 'learning_rate': 1.2552835200616506e-05, 'epoch': 0.44} + + 44%|████▎ | 3212/7378 [11:01:20<14:17:10, 12.35s/it] + 44%|████▎ | 3213/7378 [11:01:32<14:23:44, 12.44s/it] + +{'loss': 0.4882, 'learning_rate': 1.254859026514879e-05, 'epoch': 0.44} + + 44%|████▎ | 3213/7378 [11:01:32<14:23:44, 12.44s/it] + 44%|████▎ | 3214/7378 [11:01:45<14:19:52, 12.39s/it] + +{'loss': 0.4795, 'learning_rate': 1.2544344838480239e-05, 'epoch': 0.44} + + 44%|████▎ | 3214/7378 [11:01:45<14:19:52, 12.39s/it] + 44%|████▎ | 3215/7378 [11:01:57<14:21:47, 12.42s/it] + +{'loss': 0.4585, 'learning_rate': 1.2540098921429096e-05, 'epoch': 0.44} + + 44%|████▎ | 3215/7378 [11:01:57<14:21:47, 12.42s/it] + 44%|████▎ | 3216/7378 [11:02:09<14:13:57, 12.31s/it] + +{'loss': 0.3856, 'learning_rate': 1.2535852514813691e-05, 'epoch': 0.44} + + 44%|████▎ | 3216/7378 [11:02:09<14:13:57, 12.31s/it] + 44%|████▎ | 3217/7378 [11:02:21<14:14:52, 12.33s/it] + +{'loss': 0.4443, 'learning_rate': 1.2531605619452458e-05, 'epoch': 0.44} + + 44%|████▎ | 3217/7378 [11:02:21<14:14:52, 12.33s/it] + 44%|████▎ | 3218/7378 [11:02:34<14:17:42, 12.37s/it] + +{'loss': 0.4832, 'learning_rate': 1.2527358236163916e-05, 'epoch': 0.44} + + 44%|████▎ | 3218/7378 [11:02:34<14:17:42, 12.37s/it] + 44%|████▎ | 3219/7378 [11:02:47<14:22:54, 12.45s/it] + +{'loss': 0.4781, 'learning_rate': 1.252311036576668e-05, 'epoch': 0.44} + + 44%|████▎ | 3219/7378 [11:02:47<14:22:54, 12.45s/it] + 44%|████▎ | 3220/7378 [11:02:59<14:25:57, 12.50s/it] + +{'loss': 0.4486, 'learning_rate': 1.2518862009079464e-05, 'epoch': 0.44} + + 44%|████▎ | 3220/7378 [11:02:59<14:25:57, 12.50s/it] + 44%|████▎ | 3221/7378 [11:03:12<14:24:33, 12.48s/it] + +{'loss': 0.4803, 'learning_rate': 1.251461316692107e-05, 'epoch': 0.44} + + 44%|████▎ | 3221/7378 [11:03:12<14:24:33, 12.48s/it] + 44%|████▎ | 3222/7378 [11:03:23<14:10:02, 12.27s/it] + +{'loss': 0.5166, 'learning_rate': 1.2510363840110396e-05, 'epoch': 0.44} + + 44%|████▎ | 3222/7378 [11:03:23<14:10:02, 12.27s/it] + 44%|████▎ | 3223/7378 [11:03:36<14:14:08, 12.33s/it] + +{'loss': 0.5041, 'learning_rate': 1.2506114029466432e-05, 'epoch': 0.44} + + 44%|████▎ | 3223/7378 [11:03:36<14:14:08, 12.33s/it] + 44%|████▎ | 3224/7378 [11:03:48<14:11:28, 12.30s/it] + +{'loss': 0.4233, 'learning_rate': 1.2501863735808267e-05, 'epoch': 0.44} + + 44%|████▎ | 3224/7378 [11:03:48<14:11:28, 12.30s/it] + 44%|████▎ | 3225/7378 [11:04:00<14:11:39, 12.30s/it] + +{'loss': 0.4553, 'learning_rate': 1.2497612959955077e-05, 'epoch': 0.44} + + 44%|████▎ | 3225/7378 [11:04:00<14:11:39, 12.30s/it] + 44%|████▎ | 3226/7378 [11:04:13<14:11:37, 12.31s/it] + +{'loss': 0.4604, 'learning_rate': 1.2493361702726126e-05, 'epoch': 0.44} + + 44%|████▎ | 3226/7378 [11:04:13<14:11:37, 12.31s/it] + 44%|████▎ | 3227/7378 [11:04:25<14:02:43, 12.18s/it] + +{'loss': 0.4557, 'learning_rate': 1.2489109964940784e-05, 'epoch': 0.44} + + 44%|████▎ | 3227/7378 [11:04:25<14:02:43, 12.18s/it] + 44%|████▍ | 3228/7378 [11:04:37<14:06:02, 12.23s/it] + +{'loss': 0.4422, 'learning_rate': 1.2484857747418504e-05, 'epoch': 0.44} + + 44%|████▍ | 3228/7378 [11:04:37<14:06:02, 12.23s/it] + 44%|████▍ | 3229/7378 [11:04:50<14:15:10, 12.37s/it] + +{'loss': 0.408, 'learning_rate': 1.2480605050978838e-05, 'epoch': 0.44} + + 44%|████▍ | 3229/7378 [11:04:50<14:15:10, 12.37s/it] + 44%|████▍ | 3230/7378 [11:05:02<14:16:05, 12.38s/it] + +{'loss': 0.511, 'learning_rate': 1.2476351876441419e-05, 'epoch': 0.44} + + 44%|████▍ | 3230/7378 [11:05:02<14:16:05, 12.38s/it] + 44%|████▍ | 3231/7378 [11:05:14<14:10:07, 12.30s/it] + +{'loss': 0.4622, 'learning_rate': 1.2472098224625989e-05, 'epoch': 0.44} + + 44%|████▍ | 3231/7378 [11:05:14<14:10:07, 12.30s/it] + 44%|████▍ | 3232/7378 [11:05:26<14:01:37, 12.18s/it] + +{'loss': 0.455, 'learning_rate': 1.2467844096352366e-05, 'epoch': 0.44} + + 44%|████▍ | 3232/7378 [11:05:26<14:01:37, 12.18s/it] + 44%|████▍ | 3233/7378 [11:05:38<14:07:02, 12.26s/it] + +{'loss': 0.4593, 'learning_rate': 1.2463589492440468e-05, 'epoch': 0.44} + + 44%|████▍ | 3233/7378 [11:05:38<14:07:02, 12.26s/it] + 44%|████▍ | 3234/7378 [11:05:51<14:06:37, 12.26s/it] + +{'loss': 0.4947, 'learning_rate': 1.2459334413710306e-05, 'epoch': 0.44} + + 44%|████▍ | 3234/7378 [11:05:51<14:06:37, 12.26s/it] + 44%|████▍ | 3235/7378 [11:06:03<14:07:23, 12.27s/it] + +{'loss': 0.426, 'learning_rate': 1.2455078860981978e-05, 'epoch': 0.44} + + 44%|████▍ | 3235/7378 [11:06:03<14:07:23, 12.27s/it] + 44%|████▍ | 3236/7378 [11:06:16<14:11:03, 12.33s/it] + +{'loss': 0.485, 'learning_rate': 1.2450822835075672e-05, 'epoch': 0.44} + + 44%|████▍ | 3236/7378 [11:06:16<14:11:03, 12.33s/it] + 44%|████▍ | 3237/7378 [11:06:28<14:04:04, 12.23s/it] + +{'loss': 0.4265, 'learning_rate': 1.2446566336811675e-05, 'epoch': 0.44} + + 44%|████▍ | 3237/7378 [11:06:28<14:04:04, 12.23s/it] + 44%|████▍ | 3238/7378 [11:06:40<13:59:41, 12.17s/it] + +{'loss': 0.4241, 'learning_rate': 1.244230936701036e-05, 'epoch': 0.44} + + 44%|████▍ | 3238/7378 [11:06:40<13:59:41, 12.17s/it] + 44%|████▍ | 3239/7378 [11:06:52<14:06:37, 12.27s/it] + +{'loss': 0.5097, 'learning_rate': 1.2438051926492184e-05, 'epoch': 0.44} + + 44%|████▍ | 3239/7378 [11:06:52<14:06:37, 12.27s/it] + 44%|████▍ | 3240/7378 [11:07:05<14:20:41, 12.48s/it] + +{'loss': 0.3802, 'learning_rate': 1.2433794016077713e-05, 'epoch': 0.44} + + 44%|████▍ | 3240/7378 [11:07:05<14:20:41, 12.48s/it] + 44%|████▍ | 3241/7378 [11:07:17<14:08:49, 12.31s/it] + +{'loss': 0.4554, 'learning_rate': 1.2429535636587587e-05, 'epoch': 0.44} + + 44%|████▍ | 3241/7378 [11:07:17<14:08:49, 12.31s/it] + 44%|████▍ | 3242/7378 [11:07:30<14:21:03, 12.49s/it] + +{'loss': 0.5049, 'learning_rate': 1.242527678884254e-05, 'epoch': 0.44} + + 44%|████▍ | 3242/7378 [11:07:30<14:21:03, 12.49s/it] + 44%|████▍ | 3243/7378 [11:07:42<14:12:56, 12.38s/it] + +{'loss': 0.4185, 'learning_rate': 1.2421017473663399e-05, 'epoch': 0.44} + + 44%|████▍ | 3243/7378 [11:07:42<14:12:56, 12.38s/it] + 44%|████▍ | 3244/7378 [11:07:54<14:10:51, 12.35s/it] + +{'loss': 0.447, 'learning_rate': 1.2416757691871082e-05, 'epoch': 0.44} + + 44%|████▍ | 3244/7378 [11:07:54<14:10:51, 12.35s/it] + 44%|████▍ | 3245/7378 [11:08:06<14:04:09, 12.25s/it] + +{'loss': 0.4917, 'learning_rate': 1.2412497444286596e-05, 'epoch': 0.44} + + 44%|████▍ | 3245/7378 [11:08:06<14:04:09, 12.25s/it] + 44%|████▍ | 3246/7378 [11:08:19<14:08:43, 12.32s/it] + +{'loss': 0.4504, 'learning_rate': 1.2408236731731036e-05, 'epoch': 0.44} + + 44%|████▍ | 3246/7378 [11:08:19<14:08:43, 12.32s/it] + 44%|████▍ | 3247/7378 [11:08:31<14:09:23, 12.34s/it] + +{'loss': 0.529, 'learning_rate': 1.2403975555025584e-05, 'epoch': 0.44} + + 44%|████▍ | 3247/7378 [11:08:31<14:09:23, 12.34s/it] + 44%|████▍ | 3248/7378 [11:08:43<14:09:33, 12.34s/it] + +{'loss': 0.3734, 'learning_rate': 1.2399713914991522e-05, 'epoch': 0.44} + + 44%|████▍ | 3248/7378 [11:08:43<14:09:33, 12.34s/it] + 44%|████▍ | 3249/7378 [11:08:56<14:12:28, 12.39s/it] + +{'loss': 0.4461, 'learning_rate': 1.2395451812450208e-05, 'epoch': 0.44} + + 44%|████▍ | 3249/7378 [11:08:56<14:12:28, 12.39s/it] + 44%|████▍ | 3250/7378 [11:09:08<14:10:49, 12.37s/it] + +{'loss': 0.4739, 'learning_rate': 1.23911892482231e-05, 'epoch': 0.44} + + 44%|████▍ | 3250/7378 [11:09:08<14:10:49, 12.37s/it] + 44%|████▍ | 3251/7378 [11:09:21<14:10:56, 12.37s/it] + +{'loss': 0.4577, 'learning_rate': 1.2386926223131734e-05, 'epoch': 0.44} + + 44%|████▍ | 3251/7378 [11:09:21<14:10:56, 12.37s/it] + 44%|████▍ | 3252/7378 [11:09:33<14:08:18, 12.34s/it] + +{'loss': 0.3905, 'learning_rate': 1.238266273799775e-05, 'epoch': 0.44} + + 44%|████▍ | 3252/7378 [11:09:33<14:08:18, 12.34s/it] + 44%|████▍ | 3253/7378 [11:09:45<14:06:45, 12.32s/it] + +{'loss': 0.5153, 'learning_rate': 1.237839879364286e-05, 'epoch': 0.44} + + 44%|████▍ | 3253/7378 [11:09:45<14:06:45, 12.32s/it] + 44%|████▍ | 3254/7378 [11:09:57<14:01:03, 12.24s/it] + +{'loss': 0.4886, 'learning_rate': 1.2374134390888882e-05, 'epoch': 0.44} + + 44%|████▍ | 3254/7378 [11:09:57<14:01:03, 12.24s/it] + 44%|████▍ | 3255/7378 [11:10:09<14:00:22, 12.23s/it] + +{'loss': 0.4479, 'learning_rate': 1.2369869530557703e-05, 'epoch': 0.44} + + 44%|████▍ | 3255/7378 [11:10:09<14:00:22, 12.23s/it] + 44%|████▍ | 3256/7378 [11:10:22<14:04:04, 12.29s/it] + +{'loss': 0.4598, 'learning_rate': 1.2365604213471312e-05, 'epoch': 0.44} + + 44%|████▍ | 3256/7378 [11:10:22<14:04:04, 12.29s/it] + 44%|████▍ | 3257/7378 [11:10:34<14:04:46, 12.30s/it] + +{'loss': 0.4641, 'learning_rate': 1.2361338440451783e-05, 'epoch': 0.44} + + 44%|████▍ | 3257/7378 [11:10:34<14:04:46, 12.30s/it] + 44%|████▍ | 3258/7378 [11:10:46<14:00:14, 12.24s/it] + +{'loss': 0.4417, 'learning_rate': 1.2357072212321272e-05, 'epoch': 0.44} + + 44%|████▍ | 3258/7378 [11:10:46<14:00:14, 12.24s/it] + 44%|████▍ | 3259/7378 [11:10:59<14:00:44, 12.25s/it] + +{'loss': 0.4729, 'learning_rate': 1.2352805529902036e-05, 'epoch': 0.44} + + 44%|████▍ | 3259/7378 [11:10:59<14:00:44, 12.25s/it] + 44%|████▍ | 3260/7378 [11:11:11<14:03:11, 12.29s/it] + +{'loss': 0.529, 'learning_rate': 1.2348538394016403e-05, 'epoch': 0.44} + + 44%|████▍ | 3260/7378 [11:11:11<14:03:11, 12.29s/it] + 44%|████▍ | 3261/7378 [11:11:24<14:09:57, 12.39s/it] + +{'loss': 0.4754, 'learning_rate': 1.2344270805486804e-05, 'epoch': 0.44} + + 44%|████▍ | 3261/7378 [11:11:24<14:09:57, 12.39s/it] + 44%|████▍ | 3262/7378 [11:11:36<14:06:54, 12.35s/it] + +{'loss': 0.4543, 'learning_rate': 1.2340002765135741e-05, 'epoch': 0.44} + + 44%|████▍ | 3262/7378 [11:11:36<14:06:54, 12.35s/it] + 44%|████▍ | 3263/7378 [11:11:48<13:58:51, 12.23s/it] + +{'loss': 0.4658, 'learning_rate': 1.2335734273785822e-05, 'epoch': 0.44} + + 44%|████▍ | 3263/7378 [11:11:48<13:58:51, 12.23s/it] + 44%|████▍ | 3264/7378 [11:12:00<13:56:18, 12.20s/it] + +{'loss': 0.5349, 'learning_rate': 1.2331465332259724e-05, 'epoch': 0.44} + + 44%|████▍ | 3264/7378 [11:12:00<13:56:18, 12.20s/it] + 44%|████▍ | 3265/7378 [11:12:12<13:52:29, 12.14s/it] + +{'loss': 0.4779, 'learning_rate': 1.2327195941380221e-05, 'epoch': 0.44} + + 44%|████▍ | 3265/7378 [11:12:12<13:52:29, 12.14s/it] + 44%|████▍ | 3266/7378 [11:12:24<13:58:31, 12.24s/it] + +{'loss': 0.4847, 'learning_rate': 1.2322926101970171e-05, 'epoch': 0.44} + + 44%|████▍ | 3266/7378 [11:12:24<13:58:31, 12.24s/it] + 44%|████▍ | 3267/7378 [11:12:37<14:04:21, 12.32s/it] + +{'loss': 0.4457, 'learning_rate': 1.2318655814852519e-05, 'epoch': 0.44} + + 44%|████▍ | 3267/7378 [11:12:37<14:04:21, 12.32s/it] + 44%|████▍ | 3268/7378 [11:12:49<14:08:31, 12.39s/it] + +{'loss': 0.4489, 'learning_rate': 1.2314385080850297e-05, 'epoch': 0.44} + + 44%|████▍ | 3268/7378 [11:12:49<14:08:31, 12.39s/it] + 44%|████▍ | 3269/7378 [11:13:02<14:02:25, 12.30s/it] + +{'loss': 0.4376, 'learning_rate': 1.2310113900786622e-05, 'epoch': 0.44} + + 44%|████▍ | 3269/7378 [11:13:02<14:02:25, 12.30s/it] + 44%|████▍ | 3270/7378 [11:13:14<13:58:45, 12.25s/it] + +{'loss': 0.4565, 'learning_rate': 1.2305842275484694e-05, 'epoch': 0.44} + + 44%|████▍ | 3270/7378 [11:13:14<13:58:45, 12.25s/it] + 44%|████▍ | 3271/7378 [11:13:26<13:56:44, 12.22s/it] + +{'loss': 0.4429, 'learning_rate': 1.2301570205767805e-05, 'epoch': 0.44} + + 44%|████▍ | 3271/7378 [11:13:26<13:56:44, 12.22s/it] + 44%|████▍ | 3272/7378 [11:13:38<13:59:02, 12.26s/it] + +{'loss': 0.4823, 'learning_rate': 1.2297297692459326e-05, 'epoch': 0.44} + + 44%|████▍ | 3272/7378 [11:13:38<13:59:02, 12.26s/it] + 44%|████▍ | 3273/7378 [11:13:50<13:57:37, 12.24s/it] + +{'loss': 0.4389, 'learning_rate': 1.2293024736382724e-05, 'epoch': 0.44} + + 44%|████▍ | 3273/7378 [11:13:50<13:57:37, 12.24s/it] + 44%|████▍ | 3274/7378 [11:14:03<13:56:19, 12.23s/it] + +{'loss': 0.4711, 'learning_rate': 1.2288751338361535e-05, 'epoch': 0.44} + + 44%|████▍ | 3274/7378 [11:14:03<13:56:19, 12.23s/it] + 44%|████▍ | 3275/7378 [11:14:15<13:51:55, 12.17s/it] + +{'loss': 0.4592, 'learning_rate': 1.2284477499219399e-05, 'epoch': 0.44} + + 44%|████▍ | 3275/7378 [11:14:15<13:51:55, 12.17s/it] + 44%|████▍ | 3276/7378 [11:14:27<13:56:35, 12.24s/it] + +{'loss': 0.4287, 'learning_rate': 1.2280203219780025e-05, 'epoch': 0.44} + + 44%|████▍ | 3276/7378 [11:14:27<13:56:35, 12.24s/it] + 44%|████▍ | 3277/7378 [11:14:39<13:57:41, 12.26s/it] + +{'loss': 0.511, 'learning_rate': 1.2275928500867211e-05, 'epoch': 0.44} + + 44%|████▍ | 3277/7378 [11:14:39<13:57:41, 12.26s/it] + 44%|████▍ | 3278/7378 [11:14:52<14:02:07, 12.32s/it] + +{'loss': 0.4466, 'learning_rate': 1.227165334330485e-05, 'epoch': 0.44} + + 44%|████▍ | 3278/7378 [11:14:52<14:02:07, 12.32s/it] + 44%|████▍ | 3279/7378 [11:15:04<14:05:15, 12.37s/it] + +{'loss': 0.483, 'learning_rate': 1.2267377747916907e-05, 'epoch': 0.44} + + 44%|████▍ | 3279/7378 [11:15:04<14:05:15, 12.37s/it] + 44%|████▍ | 3280/7378 [11:15:16<14:00:50, 12.31s/it] + +{'loss': 0.444, 'learning_rate': 1.2263101715527437e-05, 'epoch': 0.44} + + 44%|████▍ | 3280/7378 [11:15:16<14:00:50, 12.31s/it] + 44%|████▍ | 3281/7378 [11:15:29<13:55:57, 12.24s/it] + +{'loss': 0.4779, 'learning_rate': 1.2258825246960577e-05, 'epoch': 0.44} + + 44%|████▍ | 3281/7378 [11:15:29<13:55:57, 12.24s/it] + 44%|████▍ | 3282/7378 [11:15:41<13:53:23, 12.21s/it] + +{'loss': 0.4101, 'learning_rate': 1.2254548343040552e-05, 'epoch': 0.44} + + 44%|████▍ | 3282/7378 [11:15:41<13:53:23, 12.21s/it] + 44%|████▍ | 3283/7378 [11:15:53<13:58:53, 12.29s/it] + +{'loss': 0.4798, 'learning_rate': 1.2250271004591663e-05, 'epoch': 0.44} + + 44%|████▍ | 3283/7378 [11:15:53<13:58:53, 12.29s/it] + 45%|████▍ | 3284/7378 [11:16:05<13:49:08, 12.15s/it] + +{'loss': 0.5025, 'learning_rate': 1.2245993232438308e-05, 'epoch': 0.45} + + 45%|████▍ | 3284/7378 [11:16:05<13:49:08, 12.15s/it] + 45%|████▍ | 3285/7378 [11:16:17<13:47:34, 12.13s/it] + +{'loss': 0.4641, 'learning_rate': 1.2241715027404952e-05, 'epoch': 0.45} + + 45%|████▍ | 3285/7378 [11:16:17<13:47:34, 12.13s/it] + 45%|████▍ | 3286/7378 [11:16:29<13:49:22, 12.16s/it] + +{'loss': 0.5222, 'learning_rate': 1.2237436390316158e-05, 'epoch': 0.45} + + 45%|████▍ | 3286/7378 [11:16:29<13:49:22, 12.16s/it] + 45%|████▍ | 3287/7378 [11:16:42<13:51:56, 12.20s/it] + +{'loss': 0.4764, 'learning_rate': 1.2233157321996565e-05, 'epoch': 0.45} + + 45%|████▍ | 3287/7378 [11:16:42<13:51:56, 12.20s/it] + 45%|████▍ | 3288/7378 [11:16:54<13:54:47, 12.25s/it] + +{'loss': 0.5178, 'learning_rate': 1.2228877823270891e-05, 'epoch': 0.45} + + 45%|████▍ | 3288/7378 [11:16:54<13:54:47, 12.25s/it] + 45%|████▍ | 3289/7378 [11:17:07<14:02:25, 12.36s/it] + +{'loss': 0.4852, 'learning_rate': 1.222459789496395e-05, 'epoch': 0.45} + + 45%|████▍ | 3289/7378 [11:17:07<14:02:25, 12.36s/it] + 45%|████▍ | 3290/7378 [11:17:19<14:03:49, 12.38s/it] + +{'loss': 0.5171, 'learning_rate': 1.222031753790063e-05, 'epoch': 0.45} + + 45%|████▍ | 3290/7378 [11:17:19<14:03:49, 12.38s/it] + 45%|████▍ | 3291/7378 [11:17:31<13:57:17, 12.29s/it] + +{'loss': 0.4386, 'learning_rate': 1.2216036752905897e-05, 'epoch': 0.45} + + 45%|████▍ | 3291/7378 [11:17:31<13:57:17, 12.29s/it] + 45%|████▍ | 3292/7378 [11:17:44<14:05:37, 12.42s/it] + +{'loss': 0.4694, 'learning_rate': 1.2211755540804813e-05, 'epoch': 0.45} + + 45%|████▍ | 3292/7378 [11:17:44<14:05:37, 12.42s/it] + 45%|████▍ | 3293/7378 [11:17:56<14:06:28, 12.43s/it] + +{'loss': 0.3993, 'learning_rate': 1.2207473902422506e-05, 'epoch': 0.45} + + 45%|████▍ | 3293/7378 [11:17:56<14:06:28, 12.43s/it] + 45%|████▍ | 3294/7378 [11:18:08<13:57:29, 12.30s/it] + +{'loss': 0.5827, 'learning_rate': 1.2203191838584203e-05, 'epoch': 0.45} + + 45%|████▍ | 3294/7378 [11:18:08<13:57:29, 12.30s/it] + 45%|████▍ | 3295/7378 [11:18:20<13:52:44, 12.24s/it] + +{'loss': 0.4525, 'learning_rate': 1.2198909350115198e-05, 'epoch': 0.45} + + 45%|████▍ | 3295/7378 [11:18:20<13:52:44, 12.24s/it] + 45%|████▍ | 3296/7378 [11:18:33<13:51:50, 12.23s/it] + +{'loss': 0.4593, 'learning_rate': 1.219462643784088e-05, 'epoch': 0.45} + + 45%|████▍ | 3296/7378 [11:18:33<13:51:50, 12.23s/it] + 45%|████▍ | 3297/7378 [11:18:45<13:53:02, 12.25s/it] + +{'loss': 0.4168, 'learning_rate': 1.2190343102586707e-05, 'epoch': 0.45} + + 45%|████▍ | 3297/7378 [11:18:45<13:53:02, 12.25s/it] + 45%|████▍ | 3298/7378 [11:18:57<13:52:21, 12.24s/it] + +{'loss': 0.4631, 'learning_rate': 1.2186059345178228e-05, 'epoch': 0.45} + + 45%|████▍ | 3298/7378 [11:18:57<13:52:21, 12.24s/it] + 45%|████▍ | 3299/7378 [11:19:09<13:53:53, 12.27s/it] + +{'loss': 0.4331, 'learning_rate': 1.2181775166441067e-05, 'epoch': 0.45} + + 45%|████▍ | 3299/7378 [11:19:09<13:53:53, 12.27s/it] + 45%|████▍ | 3300/7378 [11:19:22<13:51:59, 12.24s/it] + +{'loss': 0.4711, 'learning_rate': 1.2177490567200938e-05, 'epoch': 0.45} + + 45%|████▍ | 3300/7378 [11:19:22<13:51:59, 12.24s/it] + 45%|████▍ | 3301/7378 [11:19:34<13:49:30, 12.21s/it] + +{'loss': 0.4449, 'learning_rate': 1.2173205548283626e-05, 'epoch': 0.45} + + 45%|████▍ | 3301/7378 [11:19:34<13:49:30, 12.21s/it] + 45%|████▍ | 3302/7378 [11:19:46<13:46:20, 12.16s/it] + +{'loss': 0.5614, 'learning_rate': 1.2168920110515002e-05, 'epoch': 0.45} + + 45%|████▍ | 3302/7378 [11:19:46<13:46:20, 12.16s/it] + 45%|████▍ | 3303/7378 [11:19:58<13:50:13, 12.22s/it] + +{'loss': 0.3813, 'learning_rate': 1.2164634254721017e-05, 'epoch': 0.45} + + 45%|████▍ | 3303/7378 [11:19:58<13:50:13, 12.22s/it] + 45%|████▍ | 3304/7378 [11:20:11<14:01:27, 12.39s/it] + +{'loss': 0.5013, 'learning_rate': 1.2160347981727704e-05, 'epoch': 0.45} + + 45%|████▍ | 3304/7378 [11:20:11<14:01:27, 12.39s/it] + 45%|████▍ | 3305/7378 [11:20:23<13:59:57, 12.37s/it] + +{'loss': 0.4649, 'learning_rate': 1.2156061292361174e-05, 'epoch': 0.45} + + 45%|████▍ | 3305/7378 [11:20:23<13:59:57, 12.37s/it] + 45%|████▍ | 3306/7378 [11:20:36<14:06:36, 12.47s/it] + +{'loss': 0.5349, 'learning_rate': 1.215177418744762e-05, 'epoch': 0.45} + + 45%|████▍ | 3306/7378 [11:20:36<14:06:36, 12.47s/it] + 45%|████▍ | 3307/7378 [11:20:48<13:57:32, 12.34s/it] + +{'loss': 0.4692, 'learning_rate': 1.214748666781331e-05, 'epoch': 0.45} + + 45%|████▍ | 3307/7378 [11:20:48<13:57:32, 12.34s/it] + 45%|████▍ | 3308/7378 [11:21:00<14:00:29, 12.39s/it] + +{'loss': 0.5128, 'learning_rate': 1.2143198734284602e-05, 'epoch': 0.45} + + 45%|████▍ | 3308/7378 [11:21:00<14:00:29, 12.39s/it] + 45%|████▍ | 3309/7378 [11:21:13<13:57:09, 12.34s/it] + +{'loss': 0.4439, 'learning_rate': 1.2138910387687926e-05, 'epoch': 0.45} + + 45%|████▍ | 3309/7378 [11:21:13<13:57:09, 12.34s/it] + 45%|████▍ | 3310/7378 [11:21:25<13:55:56, 12.33s/it] + +{'loss': 0.4639, 'learning_rate': 1.2134621628849789e-05, 'epoch': 0.45} + + 45%|████▍ | 3310/7378 [11:21:25<13:55:56, 12.33s/it] + 45%|████▍ | 3311/7378 [11:21:37<13:56:42, 12.34s/it] + +{'loss': 0.4681, 'learning_rate': 1.2130332458596793e-05, 'epoch': 0.45} + + 45%|████▍ | 3311/7378 [11:21:37<13:56:42, 12.34s/it] + 45%|████▍ | 3312/7378 [11:21:50<13:52:49, 12.29s/it] + +{'loss': 0.4702, 'learning_rate': 1.2126042877755595e-05, 'epoch': 0.45} + + 45%|████▍ | 3312/7378 [11:21:50<13:52:49, 12.29s/it] + 45%|████▍ | 3313/7378 [11:22:02<13:57:08, 12.36s/it] + +{'loss': 0.4782, 'learning_rate': 1.2121752887152953e-05, 'epoch': 0.45} + + 45%|████▍ | 3313/7378 [11:22:02<13:57:08, 12.36s/it] + 45%|████▍ | 3314/7378 [11:22:14<13:44:21, 12.17s/it] + +{'loss': 0.4481, 'learning_rate': 1.2117462487615695e-05, 'epoch': 0.45} + + 45%|████▍ | 3314/7378 [11:22:14<13:44:21, 12.17s/it] + 45%|████▍ | 3315/7378 [11:22:26<13:51:34, 12.28s/it] + +{'loss': 0.4314, 'learning_rate': 1.2113171679970725e-05, 'epoch': 0.45} + + 45%|████▍ | 3315/7378 [11:22:26<13:51:34, 12.28s/it] + 45%|████▍ | 3316/7378 [11:22:39<13:54:58, 12.33s/it] + +{'loss': 0.4708, 'learning_rate': 1.2108880465045032e-05, 'epoch': 0.45} + + 45%|████▍ | 3316/7378 [11:22:39<13:54:58, 12.33s/it] + 45%|████▍ | 3317/7378 [11:22:51<13:51:30, 12.29s/it] + +{'loss': 0.4675, 'learning_rate': 1.210458884366568e-05, 'epoch': 0.45} + + 45%|████▍ | 3317/7378 [11:22:51<13:51:30, 12.29s/it] + 45%|████▍ | 3318/7378 [11:23:03<13:47:50, 12.23s/it] + +{'loss': 0.4656, 'learning_rate': 1.2100296816659807e-05, 'epoch': 0.45} + + 45%|████▍ | 3318/7378 [11:23:03<13:47:50, 12.23s/it] + 45%|████▍ | 3319/7378 [11:23:16<13:56:48, 12.37s/it] + +{'loss': 0.4048, 'learning_rate': 1.2096004384854642e-05, 'epoch': 0.45} + + 45%|████▍ | 3319/7378 [11:23:16<13:56:48, 12.37s/it] + 45%|████▍ | 3320/7378 [11:23:28<13:48:00, 12.24s/it] + +{'loss': 0.4713, 'learning_rate': 1.209171154907748e-05, 'epoch': 0.45} + + 45%|████▍ | 3320/7378 [11:23:28<13:48:00, 12.24s/it] + 45%|████▌ | 3321/7378 [11:23:40<13:45:34, 12.21s/it] + +{'loss': 0.4932, 'learning_rate': 1.2087418310155694e-05, 'epoch': 0.45} + + 45%|████▌ | 3321/7378 [11:23:40<13:45:34, 12.21s/it] + 45%|████▌ | 3322/7378 [11:23:52<13:49:16, 12.27s/it] + +{'loss': 0.5244, 'learning_rate': 1.2083124668916745e-05, 'epoch': 0.45} + + 45%|████▌ | 3322/7378 [11:23:52<13:49:16, 12.27s/it] + 45%|████▌ | 3323/7378 [11:24:05<13:53:14, 12.33s/it] + +{'loss': 0.4425, 'learning_rate': 1.207883062618816e-05, 'epoch': 0.45} + + 45%|████▌ | 3323/7378 [11:24:05<13:53:14, 12.33s/it] + 45%|████▌ | 3324/7378 [11:24:17<13:46:31, 12.23s/it] + +{'loss': 0.4608, 'learning_rate': 1.2074536182797551e-05, 'epoch': 0.45} + + 45%|████▌ | 3324/7378 [11:24:17<13:46:31, 12.23s/it] + 45%|████▌ | 3325/7378 [11:24:29<13:46:57, 12.24s/it] + +{'loss': 0.4886, 'learning_rate': 1.2070241339572605e-05, 'epoch': 0.45} + + 45%|████▌ | 3325/7378 [11:24:29<13:46:57, 12.24s/it] + 45%|████▌ | 3326/7378 [11:24:42<13:56:03, 12.38s/it] + +{'loss': 0.4597, 'learning_rate': 1.2065946097341086e-05, 'epoch': 0.45} + + 45%|████▌ | 3326/7378 [11:24:42<13:56:03, 12.38s/it] + 45%|████▌ | 3327/7378 [11:24:54<13:54:18, 12.36s/it] + +{'loss': 0.5006, 'learning_rate': 1.2061650456930834e-05, 'epoch': 0.45} + + 45%|████▌ | 3327/7378 [11:24:54<13:54:18, 12.36s/it] + 45%|████▌ | 3328/7378 [11:25:06<13:48:45, 12.28s/it] + +{'loss': 0.5275, 'learning_rate': 1.2057354419169763e-05, 'epoch': 0.45} + + 45%|████▌ | 3328/7378 [11:25:06<13:48:45, 12.28s/it] + 45%|████▌ | 3329/7378 [11:25:18<13:45:20, 12.23s/it] + +{'loss': 0.436, 'learning_rate': 1.2053057984885873e-05, 'epoch': 0.45} + + 45%|████▌ | 3329/7378 [11:25:18<13:45:20, 12.23s/it] + 45%|████▌ | 3330/7378 [11:25:31<13:51:10, 12.32s/it] + +{'loss': 0.4233, 'learning_rate': 1.2048761154907227e-05, 'epoch': 0.45} + + 45%|████▌ | 3330/7378 [11:25:31<13:51:10, 12.32s/it] + 45%|████▌ | 3331/7378 [11:25:43<13:55:18, 12.38s/it] + +{'loss': 0.4422, 'learning_rate': 1.2044463930061978e-05, 'epoch': 0.45} + + 45%|████▌ | 3331/7378 [11:25:43<13:55:18, 12.38s/it] + 45%|████▌ | 3332/7378 [11:25:56<13:54:06, 12.37s/it] + +{'loss': 0.4004, 'learning_rate': 1.2040166311178347e-05, 'epoch': 0.45} + + 45%|████▌ | 3332/7378 [11:25:56<13:54:06, 12.37s/it] + 45%|████▌ | 3333/7378 [11:26:08<13:59:58, 12.46s/it] + +{'loss': 0.4926, 'learning_rate': 1.2035868299084632e-05, 'epoch': 0.45} + + 45%|████▌ | 3333/7378 [11:26:08<13:59:58, 12.46s/it] + 45%|████▌ | 3334/7378 [11:26:21<14:06:01, 12.55s/it] + +{'loss': 0.4993, 'learning_rate': 1.203156989460921e-05, 'epoch': 0.45} + + 45%|████▌ | 3334/7378 [11:26:21<14:06:01, 12.55s/it] + 45%|████▌ | 3335/7378 [11:26:33<13:59:20, 12.46s/it] + +{'loss': 0.4542, 'learning_rate': 1.2027271098580527e-05, 'epoch': 0.45} + + 45%|████▌ | 3335/7378 [11:26:33<13:59:20, 12.46s/it] + 45%|████▌ | 3336/7378 [11:26:46<13:56:52, 12.42s/it] + +{'loss': 0.4687, 'learning_rate': 1.2022971911827113e-05, 'epoch': 0.45} + + 45%|████▌ | 3336/7378 [11:26:46<13:56:52, 12.42s/it] + 45%|████▌ | 3337/7378 [11:26:58<13:55:44, 12.41s/it] + +{'loss': 0.4356, 'learning_rate': 1.2018672335177562e-05, 'epoch': 0.45} + + 45%|████▌ | 3337/7378 [11:26:58<13:55:44, 12.41s/it] + 45%|████▌ | 3338/7378 [11:27:10<13:55:43, 12.41s/it] + +{'loss': 0.4871, 'learning_rate': 1.2014372369460559e-05, 'epoch': 0.45} + + 45%|████▌ | 3338/7378 [11:27:10<13:55:43, 12.41s/it] + 45%|████▌ | 3339/7378 [11:27:23<14:00:45, 12.49s/it] + +{'loss': 0.4385, 'learning_rate': 1.2010072015504845e-05, 'epoch': 0.45} + + 45%|████▌ | 3339/7378 [11:27:23<14:00:45, 12.49s/it] + 45%|████▌ | 3340/7378 [11:27:35<13:55:11, 12.41s/it] + +{'loss': 0.4559, 'learning_rate': 1.2005771274139257e-05, 'epoch': 0.45} + + 45%|████▌ | 3340/7378 [11:27:35<13:55:11, 12.41s/it] + 45%|████▌ | 3341/7378 [11:27:48<13:59:18, 12.47s/it] + +{'loss': 0.4382, 'learning_rate': 1.2001470146192689e-05, 'epoch': 0.45} + + 45%|████▌ | 3341/7378 [11:27:48<13:59:18, 12.47s/it] + 45%|████▌ | 3342/7378 [11:28:01<14:01:28, 12.51s/it] + +{'loss': 0.4209, 'learning_rate': 1.1997168632494111e-05, 'epoch': 0.45} + + 45%|████▌ | 3342/7378 [11:28:01<14:01:28, 12.51s/it] + 45%|████▌ | 3343/7378 [11:28:13<13:50:58, 12.36s/it] + +{'loss': 0.459, 'learning_rate': 1.1992866733872585e-05, 'epoch': 0.45} + + 45%|████▌ | 3343/7378 [11:28:13<13:50:58, 12.36s/it] + 45%|████▌ | 3344/7378 [11:28:25<13:43:08, 12.24s/it] + +{'loss': 0.4832, 'learning_rate': 1.1988564451157223e-05, 'epoch': 0.45} + + 45%|████▌ | 3344/7378 [11:28:25<13:43:08, 12.24s/it] + 45%|████▌ | 3345/7378 [11:28:36<13:37:30, 12.16s/it] + +{'loss': 0.4575, 'learning_rate': 1.1984261785177231e-05, 'epoch': 0.45} + + 45%|████▌ | 3345/7378 [11:28:36<13:37:30, 12.16s/it] + 45%|████▌ | 3346/7378 [11:28:49<13:46:11, 12.29s/it] + +{'loss': 0.526, 'learning_rate': 1.1979958736761872e-05, 'epoch': 0.45} + + 45%|████▌ | 3346/7378 [11:28:49<13:46:11, 12.29s/it] + 45%|████▌ | 3347/7378 [11:29:01<13:43:21, 12.26s/it] + +{'loss': 0.4443, 'learning_rate': 1.1975655306740501e-05, 'epoch': 0.45} + + 45%|████▌ | 3347/7378 [11:29:01<13:43:21, 12.26s/it] + 45%|████▌ | 3348/7378 [11:29:13<13:35:02, 12.13s/it] + +{'loss': 0.5161, 'learning_rate': 1.1971351495942527e-05, 'epoch': 0.45} + + 45%|████▌ | 3348/7378 [11:29:13<13:35:02, 12.13s/it] + 45%|████▌ | 3349/7378 [11:29:25<13:33:57, 12.12s/it] + +{'loss': 0.5095, 'learning_rate': 1.196704730519745e-05, 'epoch': 0.45} + + 45%|████▌ | 3349/7378 [11:29:25<13:33:57, 12.12s/it] + 45%|████▌ | 3350/7378 [11:29:38<13:44:06, 12.28s/it] + +{'loss': 0.4235, 'learning_rate': 1.196274273533483e-05, 'epoch': 0.45} + + 45%|████▌ | 3350/7378 [11:29:38<13:44:06, 12.28s/it] + 45%|████▌ | 3351/7378 [11:29:50<13:46:26, 12.31s/it] + +{'loss': 0.4548, 'learning_rate': 1.1958437787184306e-05, 'epoch': 0.45} + + 45%|████▌ | 3351/7378 [11:29:50<13:46:26, 12.31s/it] + 45%|████▌ | 3352/7378 [11:30:02<13:43:35, 12.27s/it] + +{'loss': 0.5173, 'learning_rate': 1.1954132461575596e-05, 'epoch': 0.45} + + 45%|████▌ | 3352/7378 [11:30:02<13:43:35, 12.27s/it] + 45%|████▌ | 3353/7378 [11:30:15<13:46:19, 12.32s/it] + +{'loss': 0.4635, 'learning_rate': 1.1949826759338469e-05, 'epoch': 0.45} + + 45%|████▌ | 3353/7378 [11:30:15<13:46:19, 12.32s/it] + 45%|████▌ | 3354/7378 [11:30:27<13:42:55, 12.27s/it] + +{'loss': 0.4713, 'learning_rate': 1.19455206813028e-05, 'epoch': 0.45} + + 45%|████▌ | 3354/7378 [11:30:27<13:42:55, 12.27s/it] + 45%|████▌ | 3355/7378 [11:30:39<13:40:57, 12.24s/it] + +{'loss': 0.4459, 'learning_rate': 1.1941214228298508e-05, 'epoch': 0.45} + + 45%|████▌ | 3355/7378 [11:30:39<13:40:57, 12.24s/it] + 45%|████▌ | 3356/7378 [11:30:52<13:44:40, 12.30s/it] + +{'loss': 0.4064, 'learning_rate': 1.1936907401155592e-05, 'epoch': 0.45} + + 45%|████▌ | 3356/7378 [11:30:52<13:44:40, 12.30s/it] + 46%|████▌ | 3357/7378 [11:31:04<13:51:14, 12.40s/it] + +{'loss': 0.4319, 'learning_rate': 1.1932600200704131e-05, 'epoch': 0.46} + + 46%|████▌ | 3357/7378 [11:31:04<13:51:14, 12.40s/it] + 46%|████▌ | 3358/7378 [11:31:17<13:58:53, 12.52s/it] + +{'loss': 0.4875, 'learning_rate': 1.1928292627774268e-05, 'epoch': 0.46} + + 46%|████▌ | 3358/7378 [11:31:17<13:58:53, 12.52s/it] + 46%|████▌ | 3359/7378 [11:31:29<13:48:29, 12.37s/it] + +{'loss': 0.4334, 'learning_rate': 1.1923984683196222e-05, 'epoch': 0.46} + + 46%|████▌ | 3359/7378 [11:31:29<13:48:29, 12.37s/it] + 46%|████▌ | 3360/7378 [11:31:41<13:46:08, 12.34s/it] + +{'loss': 0.4786, 'learning_rate': 1.191967636780028e-05, 'epoch': 0.46} + + 46%|████▌ | 3360/7378 [11:31:41<13:46:08, 12.34s/it] + 46%|████▌ | 3361/7378 [11:31:53<13:42:34, 12.29s/it] + +{'loss': 0.4925, 'learning_rate': 1.1915367682416801e-05, 'epoch': 0.46} + + 46%|████▌ | 3361/7378 [11:31:53<13:42:34, 12.29s/it] + 46%|████▌ | 3362/7378 [11:32:06<13:39:27, 12.24s/it] + +{'loss': 0.4835, 'learning_rate': 1.1911058627876222e-05, 'epoch': 0.46} + + 46%|████▌ | 3362/7378 [11:32:06<13:39:27, 12.24s/it] + 46%|████▌ | 3363/7378 [11:32:18<13:37:46, 12.22s/it] + +{'loss': 0.4399, 'learning_rate': 1.1906749205009036e-05, 'epoch': 0.46} + + 46%|████▌ | 3363/7378 [11:32:18<13:37:46, 12.22s/it] + 46%|████▌ | 3364/7378 [11:32:30<13:39:13, 12.25s/it] + +{'loss': 0.4692, 'learning_rate': 1.1902439414645828e-05, 'epoch': 0.46} + + 46%|████▌ | 3364/7378 [11:32:30<13:39:13, 12.25s/it] + 46%|████▌ | 3365/7378 [11:32:42<13:40:14, 12.26s/it] + +{'loss': 0.3588, 'learning_rate': 1.1898129257617234e-05, 'epoch': 0.46} + + 46%|████▌ | 3365/7378 [11:32:42<13:40:14, 12.26s/it] + 46%|████▌ | 3366/7378 [11:32:55<13:45:27, 12.34s/it] + +{'loss': 0.4664, 'learning_rate': 1.1893818734753975e-05, 'epoch': 0.46} + + 46%|████▌ | 3366/7378 [11:32:55<13:45:27, 12.34s/it] + 46%|████▌ | 3367/7378 [11:33:07<13:48:11, 12.39s/it] + +{'loss': 0.4797, 'learning_rate': 1.1889507846886832e-05, 'epoch': 0.46} + + 46%|████▌ | 3367/7378 [11:33:07<13:48:11, 12.39s/it] + 46%|████▌ | 3368/7378 [11:33:19<13:40:49, 12.28s/it] + +{'loss': 0.4418, 'learning_rate': 1.1885196594846666e-05, 'epoch': 0.46} + + 46%|████▌ | 3368/7378 [11:33:19<13:40:49, 12.28s/it] + 46%|████▌ | 3369/7378 [11:33:32<13:45:26, 12.35s/it] + +{'loss': 0.4327, 'learning_rate': 1.1880884979464398e-05, 'epoch': 0.46} + + 46%|████▌ | 3369/7378 [11:33:32<13:45:26, 12.35s/it] + 46%|████▌ | 3370/7378 [11:33:44<13:41:23, 12.30s/it] + +{'loss': 0.4107, 'learning_rate': 1.1876573001571028e-05, 'epoch': 0.46} + + 46%|████▌ | 3370/7378 [11:33:44<13:41:23, 12.30s/it] + 46%|████▌ | 3371/7378 [11:33:56<13:36:18, 12.22s/it] + +{'loss': 0.5108, 'learning_rate': 1.1872260661997623e-05, 'epoch': 0.46} + + 46%|████▌ | 3371/7378 [11:33:56<13:36:18, 12.22s/it] + 46%|████▌ | 3372/7378 [11:34:08<13:35:59, 12.22s/it] + +{'loss': 0.479, 'learning_rate': 1.1867947961575313e-05, 'epoch': 0.46} + + 46%|████▌ | 3372/7378 [11:34:08<13:35:59, 12.22s/it] + 46%|████▌ | 3373/7378 [11:34:21<13:36:13, 12.23s/it] + +{'loss': 0.478, 'learning_rate': 1.186363490113531e-05, 'epoch': 0.46} + + 46%|████▌ | 3373/7378 [11:34:21<13:36:13, 12.23s/it] + 46%|████▌ | 3374/7378 [11:34:33<13:39:35, 12.28s/it] + +{'loss': 0.553, 'learning_rate': 1.1859321481508885e-05, 'epoch': 0.46} + + 46%|████▌ | 3374/7378 [11:34:33<13:39:35, 12.28s/it] + 46%|████▌ | 3375/7378 [11:34:46<13:47:48, 12.41s/it] + +{'loss': 0.4711, 'learning_rate': 1.1855007703527382e-05, 'epoch': 0.46} + + 46%|████▌ | 3375/7378 [11:34:46<13:47:48, 12.41s/it] + 46%|████▌ | 3376/7378 [11:34:58<13:42:19, 12.33s/it] + +{'loss': 0.4657, 'learning_rate': 1.1850693568022218e-05, 'epoch': 0.46} + + 46%|████▌ | 3376/7378 [11:34:58<13:42:19, 12.33s/it] + 46%|████▌ | 3377/7378 [11:35:10<13:45:44, 12.38s/it] + +{'loss': 0.4661, 'learning_rate': 1.184637907582487e-05, 'epoch': 0.46} + + 46%|████▌ | 3377/7378 [11:35:10<13:45:44, 12.38s/it] + 46%|████▌ | 3378/7378 [11:35:23<13:40:40, 12.31s/it] + +{'loss': 0.4676, 'learning_rate': 1.1842064227766891e-05, 'epoch': 0.46} + + 46%|████▌ | 3378/7378 [11:35:23<13:40:40, 12.31s/it] + 46%|████▌ | 3379/7378 [11:35:35<13:36:22, 12.25s/it] + +{'loss': 0.4433, 'learning_rate': 1.1837749024679902e-05, 'epoch': 0.46} + + 46%|████▌ | 3379/7378 [11:35:35<13:36:22, 12.25s/it] + 46%|████▌ | 3380/7378 [11:35:47<13:39:41, 12.30s/it] + +{'loss': 0.4637, 'learning_rate': 1.183343346739559e-05, 'epoch': 0.46} + + 46%|████▌ | 3380/7378 [11:35:47<13:39:41, 12.30s/it] + 46%|████▌ | 3381/7378 [11:35:59<13:33:23, 12.21s/it] + +{'loss': 0.4038, 'learning_rate': 1.1829117556745706e-05, 'epoch': 0.46} + + 46%|████▌ | 3381/7378 [11:35:59<13:33:23, 12.21s/it] + 46%|████▌ | 3382/7378 [11:36:11<13:29:54, 12.16s/it] + +{'loss': 0.499, 'learning_rate': 1.1824801293562082e-05, 'epoch': 0.46} + + 46%|████▌ | 3382/7378 [11:36:11<13:29:54, 12.16s/it] + 46%|████▌ | 3383/7378 [11:36:24<13:34:26, 12.23s/it] + +{'loss': 0.4516, 'learning_rate': 1.1820484678676607e-05, 'epoch': 0.46} + + 46%|████▌ | 3383/7378 [11:36:24<13:34:26, 12.23s/it] + 46%|████▌ | 3384/7378 [11:36:36<13:38:31, 12.30s/it] + +{'loss': 0.452, 'learning_rate': 1.1816167712921237e-05, 'epoch': 0.46} + + 46%|████▌ | 3384/7378 [11:36:36<13:38:31, 12.30s/it] + 46%|████▌ | 3385/7378 [11:36:48<13:42:21, 12.36s/it] + +{'loss': 0.4645, 'learning_rate': 1.1811850397128007e-05, 'epoch': 0.46} + + 46%|████▌ | 3385/7378 [11:36:48<13:42:21, 12.36s/it] + 46%|████▌ | 3386/7378 [11:37:01<13:44:41, 12.40s/it] + +{'loss': 0.5254, 'learning_rate': 1.1807532732129004e-05, 'epoch': 0.46} + + 46%|████▌ | 3386/7378 [11:37:01<13:44:41, 12.40s/it] + 46%|████▌ | 3387/7378 [11:37:13<13:39:33, 12.32s/it] + +{'loss': 0.475, 'learning_rate': 1.1803214718756395e-05, 'epoch': 0.46} + + 46%|████▌ | 3387/7378 [11:37:13<13:39:33, 12.32s/it] + 46%|████▌ | 3388/7378 [11:37:25<13:31:28, 12.20s/it] + +{'loss': 0.3721, 'learning_rate': 1.1798896357842406e-05, 'epoch': 0.46} + + 46%|████▌ | 3388/7378 [11:37:25<13:31:28, 12.20s/it] + 46%|████▌ | 3389/7378 [11:37:38<13:40:27, 12.34s/it] + +{'loss': 0.5059, 'learning_rate': 1.179457765021934e-05, 'epoch': 0.46} + + 46%|████▌ | 3389/7378 [11:37:38<13:40:27, 12.34s/it] + 46%|████▌ | 3390/7378 [11:37:50<13:44:36, 12.41s/it] + +{'loss': 0.5072, 'learning_rate': 1.1790258596719553e-05, 'epoch': 0.46} + + 46%|████▌ | 3390/7378 [11:37:50<13:44:36, 12.41s/it] + 46%|████▌ | 3391/7378 [11:38:03<13:43:33, 12.39s/it] + +{'loss': 0.4653, 'learning_rate': 1.1785939198175481e-05, 'epoch': 0.46} + + 46%|████▌ | 3391/7378 [11:38:03<13:43:33, 12.39s/it] + 46%|████▌ | 3392/7378 [11:38:15<13:35:00, 12.27s/it] + +{'loss': 0.5091, 'learning_rate': 1.1781619455419615e-05, 'epoch': 0.46} + + 46%|████▌ | 3392/7378 [11:38:15<13:35:00, 12.27s/it] + 46%|████▌ | 3393/7378 [11:38:27<13:32:23, 12.23s/it] + +{'loss': 0.444, 'learning_rate': 1.177729936928452e-05, 'epoch': 0.46} + + 46%|████▌ | 3393/7378 [11:38:27<13:32:23, 12.23s/it] + 46%|████▌ | 3394/7378 [11:38:39<13:33:48, 12.26s/it] + +{'loss': 0.3985, 'learning_rate': 1.1772978940602826e-05, 'epoch': 0.46} + + 46%|████▌ | 3394/7378 [11:38:39<13:33:48, 12.26s/it] + 46%|████▌ | 3395/7378 [11:38:52<13:37:53, 12.32s/it] + +{'loss': 0.4287, 'learning_rate': 1.1768658170207225e-05, 'epoch': 0.46} + + 46%|████▌ | 3395/7378 [11:38:52<13:37:53, 12.32s/it] + 46%|████▌ | 3396/7378 [11:39:04<13:40:22, 12.36s/it] + +{'loss': 0.4859, 'learning_rate': 1.1764337058930482e-05, 'epoch': 0.46} + + 46%|████▌ | 3396/7378 [11:39:04<13:40:22, 12.36s/it] + 46%|████▌ | 3397/7378 [11:39:16<13:38:00, 12.33s/it] + +{'loss': 0.4795, 'learning_rate': 1.1760015607605417e-05, 'epoch': 0.46} + + 46%|████▌ | 3397/7378 [11:39:16<13:38:00, 12.33s/it] + 46%|████▌ | 3398/7378 [11:39:28<13:35:38, 12.30s/it] + +{'loss': 0.4886, 'learning_rate': 1.1755693817064927e-05, 'epoch': 0.46} + + 46%|████▌ | 3398/7378 [11:39:28<13:35:38, 12.30s/it] + 46%|████▌ | 3399/7378 [11:39:41<13:38:14, 12.34s/it] + +{'loss': 0.4362, 'learning_rate': 1.1751371688141973e-05, 'epoch': 0.46} + + 46%|████▌ | 3399/7378 [11:39:41<13:38:14, 12.34s/it] + 46%|████▌ | 3400/7378 [11:39:53<13:25:20, 12.15s/it] + +{'loss': 0.4757, 'learning_rate': 1.174704922166957e-05, 'epoch': 0.46} + + 46%|████▌ | 3400/7378 [11:39:53<13:25:20, 12.15s/it] + 46%|████▌ | 3401/7378 [11:40:05<13:39:54, 12.37s/it] + +{'loss': 0.4172, 'learning_rate': 1.1742726418480808e-05, 'epoch': 0.46} + + 46%|████▌ | 3401/7378 [11:40:05<13:39:54, 12.37s/it] + 46%|████▌ | 3402/7378 [11:40:18<13:48:33, 12.50s/it] + +{'loss': 0.4793, 'learning_rate': 1.1738403279408841e-05, 'epoch': 0.46} + + 46%|████▌ | 3402/7378 [11:40:18<13:48:33, 12.50s/it] + 46%|████▌ | 3403/7378 [11:40:30<13:40:06, 12.38s/it] + +{'loss': 0.4612, 'learning_rate': 1.1734079805286887e-05, 'epoch': 0.46} + + 46%|████▌ | 3403/7378 [11:40:30<13:40:06, 12.38s/it] + 46%|████▌ | 3404/7378 [11:40:43<13:36:18, 12.32s/it] + +{'loss': 0.4267, 'learning_rate': 1.1729755996948224e-05, 'epoch': 0.46} + + 46%|████▌ | 3404/7378 [11:40:43<13:36:18, 12.32s/it] + 46%|████▌ | 3405/7378 [11:40:54<13:25:03, 12.16s/it] + +{'loss': 0.5013, 'learning_rate': 1.1725431855226203e-05, 'epoch': 0.46} + + 46%|████▌ | 3405/7378 [11:40:54<13:25:03, 12.16s/it] + 46%|████▌ | 3406/7378 [11:41:07<13:26:25, 12.18s/it] + +{'loss': 0.4412, 'learning_rate': 1.1721107380954233e-05, 'epoch': 0.46} + + 46%|████▌ | 3406/7378 [11:41:07<13:26:25, 12.18s/it] + 46%|████▌ | 3407/7378 [11:41:20<13:47:16, 12.50s/it] + +{'loss': 0.3863, 'learning_rate': 1.1716782574965783e-05, 'epoch': 0.46} + + 46%|████▌ | 3407/7378 [11:41:20<13:47:16, 12.50s/it] + 46%|████▌ | 3408/7378 [11:41:32<13:44:26, 12.46s/it] + +{'loss': 0.4623, 'learning_rate': 1.1712457438094403e-05, 'epoch': 0.46} + + 46%|████▌ | 3408/7378 [11:41:32<13:44:26, 12.46s/it] + 46%|████▌ | 3409/7378 [11:41:45<13:53:40, 12.60s/it] + +{'loss': 0.4802, 'learning_rate': 1.1708131971173685e-05, 'epoch': 0.46} + + 46%|████▌ | 3409/7378 [11:41:45<13:53:40, 12.60s/it] + 46%|████▌ | 3410/7378 [11:41:57<13:44:44, 12.47s/it] + +{'loss': 0.467, 'learning_rate': 1.17038061750373e-05, 'epoch': 0.46} + + 46%|████▌ | 3410/7378 [11:41:57<13:44:44, 12.47s/it] + 46%|████▌ | 3411/7378 [11:42:09<13:38:41, 12.38s/it] + +{'loss': 0.47, 'learning_rate': 1.1699480050518974e-05, 'epoch': 0.46} + + 46%|████▌ | 3411/7378 [11:42:09<13:38:41, 12.38s/it] + 46%|████▌ | 3412/7378 [11:42:22<13:36:46, 12.36s/it] + +{'loss': 0.4368, 'learning_rate': 1.1695153598452507e-05, 'epoch': 0.46} + + 46%|████▌ | 3412/7378 [11:42:22<13:36:46, 12.36s/it] + 46%|████▋ | 3413/7378 [11:42:34<13:34:43, 12.33s/it] + +{'loss': 0.4109, 'learning_rate': 1.1690826819671748e-05, 'epoch': 0.46} + + 46%|████▋ | 3413/7378 [11:42:34<13:34:43, 12.33s/it] + 46%|████▋ | 3414/7378 [11:42:46<13:30:12, 12.26s/it] + +{'loss': 0.4908, 'learning_rate': 1.1686499715010616e-05, 'epoch': 0.46} + + 46%|████▋ | 3414/7378 [11:42:46<13:30:12, 12.26s/it] + 46%|████▋ | 3415/7378 [11:42:59<13:36:17, 12.36s/it] + +{'loss': 0.502, 'learning_rate': 1.1682172285303095e-05, 'epoch': 0.46} + + 46%|████▋ | 3415/7378 [11:42:59<13:36:17, 12.36s/it] + 46%|████▋ | 3416/7378 [11:43:11<13:30:41, 12.28s/it] + +{'loss': 0.4742, 'learning_rate': 1.1677844531383227e-05, 'epoch': 0.46} + + 46%|████▋ | 3416/7378 [11:43:11<13:30:41, 12.28s/it] + 46%|████▋ | 3417/7378 [11:43:23<13:35:24, 12.35s/it] + +{'loss': 0.3986, 'learning_rate': 1.1673516454085123e-05, 'epoch': 0.46} + + 46%|████▋ | 3417/7378 [11:43:23<13:35:24, 12.35s/it] + 46%|████▋ | 3418/7378 [11:43:35<13:27:02, 12.23s/it] + +{'loss': 0.4736, 'learning_rate': 1.1669188054242945e-05, 'epoch': 0.46} + + 46%|████▋ | 3418/7378 [11:43:35<13:27:02, 12.23s/it] + 46%|████▋ | 3419/7378 [11:43:50<14:20:30, 13.04s/it] + +{'loss': 0.4817, 'learning_rate': 1.1664859332690932e-05, 'epoch': 0.46} + + 46%|████▋ | 3419/7378 [11:43:50<14:20:30, 13.04s/it] + 46%|████▋ | 3420/7378 [11:44:03<14:07:36, 12.85s/it] + +{'loss': 0.4431, 'learning_rate': 1.1660530290263375e-05, 'epoch': 0.46} + + 46%|████▋ | 3420/7378 [11:44:03<14:07:36, 12.85s/it] + 46%|████▋ | 3421/7378 [11:44:15<13:59:13, 12.73s/it] + +{'loss': 0.5234, 'learning_rate': 1.1656200927794624e-05, 'epoch': 0.46} + + 46%|████▋ | 3421/7378 [11:44:15<13:59:13, 12.73s/it] + 46%|████▋ | 3422/7378 [11:44:27<13:48:52, 12.57s/it] + +{'loss': 0.524, 'learning_rate': 1.1651871246119102e-05, 'epoch': 0.46} + + 46%|████▋ | 3422/7378 [11:44:27<13:48:52, 12.57s/it] + 46%|████▋ | 3423/7378 [11:44:40<13:51:23, 12.61s/it] + +{'loss': 0.4892, 'learning_rate': 1.1647541246071283e-05, 'epoch': 0.46} + + 46%|████▋ | 3423/7378 [11:44:40<13:51:23, 12.61s/it] + 46%|████▋ | 3424/7378 [11:44:52<13:47:03, 12.55s/it] + +{'loss': 0.5166, 'learning_rate': 1.1643210928485714e-05, 'epoch': 0.46} + + 46%|████▋ | 3424/7378 [11:44:52<13:47:03, 12.55s/it] + 46%|████▋ | 3425/7378 [11:45:05<13:43:04, 12.49s/it] + +{'loss': 0.4781, 'learning_rate': 1.1638880294196984e-05, 'epoch': 0.46} + + 46%|████▋ | 3425/7378 [11:45:05<13:43:04, 12.49s/it] + 46%|████▋ | 3426/7378 [11:45:17<13:40:22, 12.46s/it] + +{'loss': 0.4737, 'learning_rate': 1.1634549344039764e-05, 'epoch': 0.46} + + 46%|████▋ | 3426/7378 [11:45:17<13:40:22, 12.46s/it] + 46%|████▋ | 3427/7378 [11:45:30<13:40:25, 12.46s/it] + +{'loss': 0.4635, 'learning_rate': 1.1630218078848776e-05, 'epoch': 0.46} + + 46%|████▋ | 3427/7378 [11:45:30<13:40:25, 12.46s/it] + 46%|████▋ | 3428/7378 [11:45:42<13:40:30, 12.46s/it] + +{'loss': 0.4349, 'learning_rate': 1.1625886499458798e-05, 'epoch': 0.46} + + 46%|████▋ | 3428/7378 [11:45:42<13:40:30, 12.46s/it] + 46%|████▋ | 3429/7378 [11:45:54<13:30:20, 12.31s/it] + +{'loss': 0.4386, 'learning_rate': 1.1621554606704682e-05, 'epoch': 0.46} + + 46%|████▋ | 3429/7378 [11:45:54<13:30:20, 12.31s/it] + 46%|████▋ | 3430/7378 [11:46:06<13:28:14, 12.28s/it] + +{'loss': 0.4198, 'learning_rate': 1.1617222401421324e-05, 'epoch': 0.46} + + 46%|████▋ | 3430/7378 [11:46:06<13:28:14, 12.28s/it] + 47%|████▋ | 3431/7378 [11:46:18<13:26:33, 12.26s/it] + +{'loss': 0.4879, 'learning_rate': 1.1612889884443694e-05, 'epoch': 0.47} + + 47%|████▋ | 3431/7378 [11:46:18<13:26:33, 12.26s/it] + 47%|████▋ | 3432/7378 [11:46:31<13:33:50, 12.37s/it] + +{'loss': 0.4497, 'learning_rate': 1.1608557056606815e-05, 'epoch': 0.47} + + 47%|████▋ | 3432/7378 [11:46:31<13:33:50, 12.37s/it] + 47%|████▋ | 3433/7378 [11:46:43<13:29:03, 12.31s/it] + +{'loss': 0.5128, 'learning_rate': 1.1604223918745775e-05, 'epoch': 0.47} + + 47%|████▋ | 3433/7378 [11:46:43<13:29:03, 12.31s/it] + 47%|████▋ | 3434/7378 [11:46:56<13:32:31, 12.36s/it] + +{'loss': 0.4802, 'learning_rate': 1.1599890471695711e-05, 'epoch': 0.47} + + 47%|████▋ | 3434/7378 [11:46:56<13:32:31, 12.36s/it] + 47%|████▋ | 3435/7378 [11:47:08<13:28:44, 12.31s/it] + +{'loss': 0.543, 'learning_rate': 1.1595556716291836e-05, 'epoch': 0.47} + + 47%|████▋ | 3435/7378 [11:47:08<13:28:44, 12.31s/it] + 47%|████▋ | 3436/7378 [11:47:20<13:20:08, 12.18s/it] + +{'loss': 0.4409, 'learning_rate': 1.1591222653369408e-05, 'epoch': 0.47} + + 47%|████▋ | 3436/7378 [11:47:20<13:20:08, 12.18s/it] + 47%|████▋ | 3437/7378 [11:47:32<13:14:38, 12.10s/it] + +{'loss': 0.4474, 'learning_rate': 1.1586888283763748e-05, 'epoch': 0.47} + + 47%|████▋ | 3437/7378 [11:47:32<13:14:38, 12.10s/it] + 47%|████▋ | 3438/7378 [11:47:44<13:21:05, 12.20s/it] + +{'loss': 0.4807, 'learning_rate': 1.1582553608310243e-05, 'epoch': 0.47} + + 47%|████▋ | 3438/7378 [11:47:44<13:21:05, 12.20s/it] + 47%|████▋ | 3439/7378 [11:47:57<13:28:30, 12.32s/it] + +{'loss': 0.4639, 'learning_rate': 1.1578218627844329e-05, 'epoch': 0.47} + + 47%|████▋ | 3439/7378 [11:47:57<13:28:30, 12.32s/it] + 47%|████▋ | 3440/7378 [11:48:09<13:25:50, 12.28s/it] + +{'loss': 0.5028, 'learning_rate': 1.157388334320151e-05, 'epoch': 0.47} + + 47%|████▋ | 3440/7378 [11:48:09<13:25:50, 12.28s/it] + 47%|████▋ | 3441/7378 [11:48:21<13:19:25, 12.18s/it] + +{'loss': 0.4582, 'learning_rate': 1.156954775521734e-05, 'epoch': 0.47} + + 47%|████▋ | 3441/7378 [11:48:21<13:19:25, 12.18s/it] + 47%|████▋ | 3442/7378 [11:48:33<13:13:02, 12.09s/it] + +{'loss': 0.5083, 'learning_rate': 1.156521186472744e-05, 'epoch': 0.47} + + 47%|████▋ | 3442/7378 [11:48:33<13:13:02, 12.09s/it] + 47%|████▋ | 3443/7378 [11:48:45<13:21:30, 12.22s/it] + +{'loss': 0.4757, 'learning_rate': 1.1560875672567482e-05, 'epoch': 0.47} + + 47%|████▋ | 3443/7378 [11:48:45<13:21:30, 12.22s/it] + 47%|████▋ | 3444/7378 [11:48:57<13:17:34, 12.16s/it] + +{'loss': 0.5329, 'learning_rate': 1.15565391795732e-05, 'epoch': 0.47} + + 47%|████▋ | 3444/7378 [11:48:57<13:17:34, 12.16s/it] + 47%|████▋ | 3445/7378 [11:49:10<13:18:56, 12.19s/it] + +{'loss': 0.428, 'learning_rate': 1.1552202386580382e-05, 'epoch': 0.47} + + 47%|████▋ | 3445/7378 [11:49:10<13:18:56, 12.19s/it] + 47%|████▋ | 3446/7378 [11:49:22<13:17:53, 12.18s/it] + +{'loss': 0.4183, 'learning_rate': 1.154786529442488e-05, 'epoch': 0.47} + + 47%|████▋ | 3446/7378 [11:49:22<13:17:53, 12.18s/it] + 47%|████▋ | 3447/7378 [11:49:34<13:23:11, 12.26s/it] + +{'loss': 0.411, 'learning_rate': 1.1543527903942603e-05, 'epoch': 0.47} + + 47%|████▋ | 3447/7378 [11:49:34<13:23:11, 12.26s/it] + 47%|████▋ | 3448/7378 [11:49:47<13:29:58, 12.37s/it] + +{'loss': 0.4962, 'learning_rate': 1.1539190215969514e-05, 'epoch': 0.47} + + 47%|████▋ | 3448/7378 [11:49:47<13:29:58, 12.37s/it] + 47%|████▋ | 3449/7378 [11:49:59<13:31:22, 12.39s/it] + +{'loss': 0.4989, 'learning_rate': 1.1534852231341627e-05, 'epoch': 0.47} + + 47%|████▋ | 3449/7378 [11:49:59<13:31:22, 12.39s/it] + 47%|████▋ | 3450/7378 [11:50:12<13:33:44, 12.43s/it] + +{'loss': 0.4546, 'learning_rate': 1.1530513950895031e-05, 'epoch': 0.47} + + 47%|████▋ | 3450/7378 [11:50:12<13:33:44, 12.43s/it] + 47%|████▋ | 3451/7378 [11:50:27<14:32:46, 13.33s/it] + +{'loss': 0.5095, 'learning_rate': 1.1526175375465853e-05, 'epoch': 0.47} + + 47%|████▋ | 3451/7378 [11:50:27<14:32:46, 13.33s/it] + 47%|████▋ | 3452/7378 [11:50:44<15:32:24, 14.25s/it] + +{'loss': 0.463, 'learning_rate': 1.1521836505890291e-05, 'epoch': 0.47} + + 47%|████▋ | 3452/7378 [11:50:44<15:32:24, 14.25s/it] + 47%|████▋ | 3453/7378 [11:50:56<14:58:44, 13.74s/it] + +{'loss': 0.4333, 'learning_rate': 1.151749734300459e-05, 'epoch': 0.47} + + 47%|████▋ | 3453/7378 [11:50:56<14:58:44, 13.74s/it] + 47%|████▋ | 3454/7378 [11:51:09<14:36:01, 13.39s/it] + +{'loss': 0.4975, 'learning_rate': 1.1513157887645061e-05, 'epoch': 0.47} + + 47%|████▋ | 3454/7378 [11:51:09<14:36:01, 13.39s/it] + 47%|████▋ | 3455/7378 [11:51:21<14:14:14, 13.07s/it] + +{'loss': 0.4182, 'learning_rate': 1.150881814064806e-05, 'epoch': 0.47} + + 47%|████▋ | 3455/7378 [11:51:21<14:14:14, 13.07s/it] + 47%|████▋ | 3456/7378 [11:51:33<13:56:24, 12.80s/it] + +{'loss': 0.3865, 'learning_rate': 1.1504478102850011e-05, 'epoch': 0.47} + + 47%|████▋ | 3456/7378 [11:51:33<13:56:24, 12.80s/it] + 47%|████▋ | 3457/7378 [11:51:45<13:45:55, 12.64s/it] + +{'loss': 0.4829, 'learning_rate': 1.1500137775087388e-05, 'epoch': 0.47} + + 47%|████▋ | 3457/7378 [11:51:45<13:45:55, 12.64s/it] + 47%|████▋ | 3458/7378 [11:52:01<14:52:06, 13.65s/it] + +{'loss': 0.5053, 'learning_rate': 1.1495797158196713e-05, 'epoch': 0.47} + + 47%|████▋ | 3458/7378 [11:52:01<14:52:06, 13.65s/it] + 47%|████▋ | 3459/7378 [11:52:17<15:35:51, 14.33s/it] + +{'loss': 0.4978, 'learning_rate': 1.1491456253014579e-05, 'epoch': 0.47} + + 47%|████▋ | 3459/7378 [11:52:17<15:35:51, 14.33s/it] + 47%|████▋ | 3460/7378 [11:52:29<14:45:47, 13.56s/it] + +{'loss': 0.4894, 'learning_rate': 1.1487115060377625e-05, 'epoch': 0.47} + + 47%|████▋ | 3460/7378 [11:52:29<14:45:47, 13.56s/it] + 47%|████▋ | 3461/7378 [11:52:44<15:14:21, 14.01s/it] + +{'loss': 0.5084, 'learning_rate': 1.148277358112255e-05, 'epoch': 0.47} + + 47%|████▋ | 3461/7378 [11:52:44<15:14:21, 14.01s/it] + 47%|████▋ | 3462/7378 [11:52:57<14:47:25, 13.60s/it] + +{'loss': 0.508, 'learning_rate': 1.1478431816086104e-05, 'epoch': 0.47} + + 47%|████▋ | 3462/7378 [11:52:57<14:47:25, 13.60s/it] + 47%|████▋ | 3463/7378 [11:53:09<14:23:06, 13.23s/it] + +{'loss': 0.4549, 'learning_rate': 1.1474089766105094e-05, 'epoch': 0.47} + + 47%|████▋ | 3463/7378 [11:53:09<14:23:06, 13.23s/it] + 47%|████▋ | 3464/7378 [11:53:21<13:55:38, 12.81s/it] + +{'loss': 0.4747, 'learning_rate': 1.1469747432016386e-05, 'epoch': 0.47} + + 47%|████▋ | 3464/7378 [11:53:21<13:55:38, 12.81s/it] + 47%|████▋ | 3465/7378 [11:53:33<13:47:50, 12.69s/it] + +{'loss': 0.4018, 'learning_rate': 1.1465404814656893e-05, 'epoch': 0.47} + + 47%|████▋ | 3465/7378 [11:53:33<13:47:50, 12.69s/it] + 47%|████▋ | 3466/7378 [11:53:46<13:45:59, 12.67s/it] + +{'loss': 0.4555, 'learning_rate': 1.1461061914863587e-05, 'epoch': 0.47} + + 47%|████▋ | 3466/7378 [11:53:46<13:45:59, 12.67s/it] + 47%|████▋ | 3467/7378 [11:53:58<13:34:57, 12.50s/it] + +{'loss': 0.5813, 'learning_rate': 1.1456718733473492e-05, 'epoch': 0.47} + + 47%|████▋ | 3467/7378 [11:53:58<13:34:57, 12.50s/it] + 47%|████▋ | 3468/7378 [11:54:10<13:23:58, 12.34s/it] + +{'loss': 0.4443, 'learning_rate': 1.1452375271323695e-05, 'epoch': 0.47} + + 47%|████▋ | 3468/7378 [11:54:10<13:23:58, 12.34s/it] + 47%|████▋ | 3469/7378 [11:54:22<13:23:40, 12.34s/it] + +{'loss': 0.4368, 'learning_rate': 1.1448031529251325e-05, 'epoch': 0.47} + + 47%|████▋ | 3469/7378 [11:54:22<13:23:40, 12.34s/it] + 47%|████▋ | 3470/7378 [11:54:35<13:20:35, 12.29s/it] + +{'loss': 0.4428, 'learning_rate': 1.1443687508093567e-05, 'epoch': 0.47} + + 47%|████▋ | 3470/7378 [11:54:35<13:20:35, 12.29s/it] + 47%|████▋ | 3471/7378 [11:54:47<13:13:30, 12.19s/it] + +{'loss': 0.4494, 'learning_rate': 1.143934320868767e-05, 'epoch': 0.47} + + 47%|████▋ | 3471/7378 [11:54:47<13:13:30, 12.19s/it] + 47%|████▋ | 3472/7378 [11:54:59<13:18:40, 12.27s/it] + +{'loss': 0.4922, 'learning_rate': 1.1434998631870923e-05, 'epoch': 0.47} + + 47%|████▋ | 3472/7378 [11:54:59<13:18:40, 12.27s/it] + 47%|████▋ | 3473/7378 [11:55:11<13:19:17, 12.28s/it] + +{'loss': 0.4279, 'learning_rate': 1.1430653778480682e-05, 'epoch': 0.47} + + 47%|████▋ | 3473/7378 [11:55:11<13:19:17, 12.28s/it] + 47%|████▋ | 3474/7378 [11:55:24<13:17:30, 12.26s/it] + +{'loss': 0.4517, 'learning_rate': 1.1426308649354346e-05, 'epoch': 0.47} + + 47%|████▋ | 3474/7378 [11:55:24<13:17:30, 12.26s/it] + 47%|████▋ | 3475/7378 [11:55:36<13:11:33, 12.17s/it] + +{'loss': 0.4562, 'learning_rate': 1.1421963245329368e-05, 'epoch': 0.47} + + 47%|████▋ | 3475/7378 [11:55:36<13:11:33, 12.17s/it] + 47%|████▋ | 3476/7378 [11:55:48<13:08:49, 12.13s/it] + +{'loss': 0.3873, 'learning_rate': 1.141761756724326e-05, 'epoch': 0.47} + + 47%|████▋ | 3476/7378 [11:55:48<13:08:49, 12.13s/it] + 47%|████▋ | 3477/7378 [11:55:59<13:05:18, 12.08s/it] + +{'loss': 0.4773, 'learning_rate': 1.1413271615933582e-05, 'epoch': 0.47} + + 47%|████▋ | 3477/7378 [11:56:00<13:05:18, 12.08s/it] + 47%|████▋ | 3478/7378 [11:56:12<13:07:46, 12.12s/it] + +{'loss': 0.4196, 'learning_rate': 1.1408925392237953e-05, 'epoch': 0.47} + + 47%|████▋ | 3478/7378 [11:56:12<13:07:46, 12.12s/it] + 47%|████▋ | 3479/7378 [11:56:24<13:05:42, 12.09s/it] + +{'loss': 0.4026, 'learning_rate': 1.140457889699403e-05, 'epoch': 0.47} + + 47%|████▋ | 3479/7378 [11:56:24<13:05:42, 12.09s/it] + 47%|████▋ | 3480/7378 [11:56:36<13:11:39, 12.19s/it] + +{'loss': 0.4308, 'learning_rate': 1.140023213103954e-05, 'epoch': 0.47} + + 47%|████▋ | 3480/7378 [11:56:36<13:11:39, 12.19s/it] + 47%|████▋ | 3481/7378 [11:56:49<13:15:17, 12.24s/it] + +{'loss': 0.5384, 'learning_rate': 1.1395885095212247e-05, 'epoch': 0.47} + + 47%|████▋ | 3481/7378 [11:56:49<13:15:17, 12.24s/it] + 47%|████▋ | 3482/7378 [11:57:01<13:15:31, 12.25s/it] + +{'loss': 0.4912, 'learning_rate': 1.1391537790349977e-05, 'epoch': 0.47} + + 47%|████▋ | 3482/7378 [11:57:01<13:15:31, 12.25s/it] + 47%|████▋ | 3483/7378 [11:57:13<13:11:06, 12.19s/it] + +{'loss': 0.4791, 'learning_rate': 1.138719021729061e-05, 'epoch': 0.47} + + 47%|████▋ | 3483/7378 [11:57:13<13:11:06, 12.19s/it] + 47%|████▋ | 3484/7378 [11:57:25<13:17:01, 12.28s/it] + +{'loss': 0.4495, 'learning_rate': 1.1382842376872065e-05, 'epoch': 0.47} + + 47%|████▋ | 3484/7378 [11:57:25<13:17:01, 12.28s/it] + 47%|████▋ | 3485/7378 [11:57:38<13:25:18, 12.41s/it] + +{'loss': 0.445, 'learning_rate': 1.1378494269932326e-05, 'epoch': 0.47} + + 47%|████▋ | 3485/7378 [11:57:38<13:25:18, 12.41s/it] + 47%|████▋ | 3486/7378 [11:57:50<13:21:44, 12.36s/it] + +{'loss': 0.4522, 'learning_rate': 1.1374145897309416e-05, 'epoch': 0.47} + + 47%|████▋ | 3486/7378 [11:57:50<13:21:44, 12.36s/it] + 47%|████▋ | 3487/7378 [11:58:02<13:16:47, 12.29s/it] + +{'loss': 0.456, 'learning_rate': 1.1369797259841423e-05, 'epoch': 0.47} + + 47%|████▋ | 3487/7378 [11:58:02<13:16:47, 12.29s/it] + 47%|████▋ | 3488/7378 [11:58:15<13:15:07, 12.26s/it] + +{'loss': 0.4259, 'learning_rate': 1.1365448358366473e-05, 'epoch': 0.47} + + 47%|████▋ | 3488/7378 [11:58:15<13:15:07, 12.26s/it] + 47%|████▋ | 3489/7378 [11:58:27<13:22:39, 12.38s/it] + +{'loss': 0.5126, 'learning_rate': 1.1361099193722753e-05, 'epoch': 0.47} + + 47%|████▋ | 3489/7378 [11:58:27<13:22:39, 12.38s/it] + 47%|████▋ | 3490/7378 [11:58:40<13:33:11, 12.55s/it] + +{'loss': 0.5563, 'learning_rate': 1.1356749766748491e-05, 'epoch': 0.47} + + 47%|████▋ | 3490/7378 [11:58:40<13:33:11, 12.55s/it] + 47%|████▋ | 3491/7378 [11:58:52<13:26:32, 12.45s/it] + +{'loss': 0.4924, 'learning_rate': 1.1352400078281977e-05, 'epoch': 0.47} + + 47%|████▋ | 3491/7378 [11:58:52<13:26:32, 12.45s/it] + 47%|████▋ | 3492/7378 [11:59:05<13:26:57, 12.46s/it] + +{'loss': 0.4699, 'learning_rate': 1.1348050129161542e-05, 'epoch': 0.47} + + 47%|████▋ | 3492/7378 [11:59:05<13:26:57, 12.46s/it] + 47%|████▋ | 3493/7378 [11:59:17<13:23:15, 12.41s/it] + +{'loss': 0.3839, 'learning_rate': 1.1343699920225571e-05, 'epoch': 0.47} + + 47%|████▋ | 3493/7378 [11:59:17<13:23:15, 12.41s/it] + 47%|████▋ | 3494/7378 [11:59:29<13:17:05, 12.31s/it] + +{'loss': 0.4818, 'learning_rate': 1.1339349452312498e-05, 'epoch': 0.47} + + 47%|████▋ | 3494/7378 [11:59:29<13:17:05, 12.31s/it] + 47%|████▋ | 3495/7378 [11:59:42<13:19:28, 12.35s/it] + +{'loss': 0.5133, 'learning_rate': 1.1334998726260806e-05, 'epoch': 0.47} + + 47%|████▋ | 3495/7378 [11:59:42<13:19:28, 12.35s/it] + 47%|████▋ | 3496/7378 [11:59:54<13:13:57, 12.27s/it] + +{'loss': 0.4634, 'learning_rate': 1.1330647742909035e-05, 'epoch': 0.47} + + 47%|████▋ | 3496/7378 [11:59:54<13:13:57, 12.27s/it] + 47%|████▋ | 3497/7378 [12:00:06<13:15:57, 12.31s/it] + +{'loss': 0.446, 'learning_rate': 1.1326296503095762e-05, 'epoch': 0.47} + + 47%|████▋ | 3497/7378 [12:00:06<13:15:57, 12.31s/it] + 47%|████▋ | 3498/7378 [12:00:18<13:10:31, 12.22s/it] + +{'loss': 0.4749, 'learning_rate': 1.1321945007659625e-05, 'epoch': 0.47} + + 47%|████▋ | 3498/7378 [12:00:18<13:10:31, 12.22s/it] + 47%|████▋ | 3499/7378 [12:00:31<13:14:21, 12.29s/it] + +{'loss': 0.4449, 'learning_rate': 1.1317593257439305e-05, 'epoch': 0.47} + + 47%|████▋ | 3499/7378 [12:00:31<13:14:21, 12.29s/it] + 47%|████▋ | 3500/7378 [12:00:43<13:20:33, 12.39s/it] + +{'loss': 0.5074, 'learning_rate': 1.131324125327353e-05, 'epoch': 0.47} + + 47%|████▋ | 3500/7378 [12:00:43<13:20:33, 12.39s/it] + 47%|████▋ | 3501/7378 [12:00:55<13:15:18, 12.31s/it] + +{'loss': 0.4204, 'learning_rate': 1.1308888996001089e-05, 'epoch': 0.47} + + 47%|████▋ | 3501/7378 [12:00:55<13:15:18, 12.31s/it] + 47%|████▋ | 3502/7378 [12:01:08<13:21:05, 12.40s/it] + +{'loss': 0.4564, 'learning_rate': 1.1304536486460805e-05, 'epoch': 0.47} + + 47%|████▋ | 3502/7378 [12:01:08<13:21:05, 12.40s/it] + 47%|████▋ | 3503/7378 [12:01:20<13:16:38, 12.34s/it] + +{'loss': 0.4586, 'learning_rate': 1.1300183725491555e-05, 'epoch': 0.47} + + 47%|████▋ | 3503/7378 [12:01:20<13:16:38, 12.34s/it] + 47%|████▋ | 3504/7378 [12:01:32<13:13:37, 12.29s/it] + +{'loss': 0.4225, 'learning_rate': 1.129583071393227e-05, 'epoch': 0.47} + + 47%|████▋ | 3504/7378 [12:01:32<13:13:37, 12.29s/it] + 48%|████▊ | 3505/7378 [12:01:45<13:23:03, 12.44s/it] + +{'loss': 0.4273, 'learning_rate': 1.1291477452621924e-05, 'epoch': 0.48} + + 48%|████▊ | 3505/7378 [12:01:45<13:23:03, 12.44s/it] + 48%|████▊ | 3506/7378 [12:01:58<13:22:55, 12.44s/it] + +{'loss': 0.4491, 'learning_rate': 1.1287123942399537e-05, 'epoch': 0.48} + + 48%|████▊ | 3506/7378 [12:01:58<13:22:55, 12.44s/it] + 48%|████▊ | 3507/7378 [12:02:10<13:21:49, 12.43s/it] + +{'loss': 0.4603, 'learning_rate': 1.128277018410418e-05, 'epoch': 0.48} + + 48%|████▊ | 3507/7378 [12:02:10<13:21:49, 12.43s/it] + 48%|████▊ | 3508/7378 [12:02:22<13:15:06, 12.33s/it] + +{'loss': 0.5258, 'learning_rate': 1.1278416178574976e-05, 'epoch': 0.48} + + 48%|████▊ | 3508/7378 [12:02:22<13:15:06, 12.33s/it] + 48%|████▊ | 3509/7378 [12:02:34<13:09:22, 12.24s/it] + +{'loss': 0.4316, 'learning_rate': 1.1274061926651086e-05, 'epoch': 0.48} + + 48%|████▊ | 3509/7378 [12:02:34<13:09:22, 12.24s/it] + 48%|████▊ | 3510/7378 [12:02:46<13:06:03, 12.19s/it] + +{'loss': 0.4734, 'learning_rate': 1.1269707429171727e-05, 'epoch': 0.48} + + 48%|████▊ | 3510/7378 [12:02:46<13:06:03, 12.19s/it] + 48%|████▊ | 3511/7378 [12:02:59<13:09:58, 12.26s/it] + +{'loss': 0.5254, 'learning_rate': 1.1265352686976161e-05, 'epoch': 0.48} + + 48%|████▊ | 3511/7378 [12:02:59<13:09:58, 12.26s/it] + 48%|████▊ | 3512/7378 [12:03:11<13:06:01, 12.20s/it] + +{'loss': 0.5163, 'learning_rate': 1.1260997700903695e-05, 'epoch': 0.48} + + 48%|████▊ | 3512/7378 [12:03:11<13:06:01, 12.20s/it] + 48%|████▊ | 3513/7378 [12:03:23<13:05:02, 12.19s/it] + +{'loss': 0.4666, 'learning_rate': 1.1256642471793684e-05, 'epoch': 0.48} + + 48%|████▊ | 3513/7378 [12:03:23<13:05:02, 12.19s/it] + 48%|████▊ | 3514/7378 [12:03:35<13:04:03, 12.17s/it] + +{'loss': 0.5248, 'learning_rate': 1.125228700048553e-05, 'epoch': 0.48} + + 48%|████▊ | 3514/7378 [12:03:35<13:04:03, 12.17s/it] + 48%|████▊ | 3515/7378 [12:03:47<13:07:14, 12.23s/it] + +{'loss': 0.4477, 'learning_rate': 1.1247931287818681e-05, 'epoch': 0.48} + + 48%|████▊ | 3515/7378 [12:03:47<13:07:14, 12.23s/it] + 48%|████▊ | 3516/7378 [12:04:00<13:10:44, 12.28s/it] + +{'loss': 0.5035, 'learning_rate': 1.1243575334632633e-05, 'epoch': 0.48} + + 48%|████▊ | 3516/7378 [12:04:00<13:10:44, 12.28s/it] + 48%|████▊ | 3517/7378 [12:04:12<13:05:00, 12.20s/it] + +{'loss': 0.4477, 'learning_rate': 1.1239219141766931e-05, 'epoch': 0.48} + + 48%|████▊ | 3517/7378 [12:04:12<13:05:00, 12.20s/it] + 48%|████▊ | 3518/7378 [12:04:24<13:10:53, 12.29s/it] + +{'loss': 0.4599, 'learning_rate': 1.1234862710061156e-05, 'epoch': 0.48} + + 48%|████▊ | 3518/7378 [12:04:24<13:10:53, 12.29s/it] + 48%|████▊ | 3519/7378 [12:04:37<13:18:11, 12.41s/it] + +{'loss': 0.5032, 'learning_rate': 1.1230506040354952e-05, 'epoch': 0.48} + + 48%|████▊ | 3519/7378 [12:04:37<13:18:11, 12.41s/it] + 48%|████▊ | 3520/7378 [12:04:49<13:14:15, 12.35s/it] + +{'loss': 0.5097, 'learning_rate': 1.1226149133487986e-05, 'epoch': 0.48} + + 48%|████▊ | 3520/7378 [12:04:49<13:14:15, 12.35s/it] + 48%|████▊ | 3521/7378 [12:05:01<13:12:00, 12.32s/it] + +{'loss': 0.4733, 'learning_rate': 1.1221791990299995e-05, 'epoch': 0.48} + + 48%|████▊ | 3521/7378 [12:05:02<13:12:00, 12.32s/it] + 48%|████▊ | 3522/7378 [12:05:14<13:09:01, 12.28s/it] + +{'loss': 0.4513, 'learning_rate': 1.1217434611630746e-05, 'epoch': 0.48} + + 48%|████▊ | 3522/7378 [12:05:14<13:09:01, 12.28s/it] + 48%|████▊ | 3523/7378 [12:05:26<13:05:47, 12.23s/it] + +{'loss': 0.4712, 'learning_rate': 1.1213076998320052e-05, 'epoch': 0.48} + + 48%|████▊ | 3523/7378 [12:05:26<13:05:47, 12.23s/it] + 48%|████▊ | 3524/7378 [12:05:38<13:14:05, 12.36s/it] + +{'loss': 0.4196, 'learning_rate': 1.1208719151207779e-05, 'epoch': 0.48} + + 48%|████▊ | 3524/7378 [12:05:38<13:14:05, 12.36s/it] + 48%|████▊ | 3525/7378 [12:05:51<13:16:13, 12.40s/it] + +{'loss': 0.367, 'learning_rate': 1.1204361071133831e-05, 'epoch': 0.48} + + 48%|████▊ | 3525/7378 [12:05:51<13:16:13, 12.40s/it] + 48%|████▊ | 3526/7378 [12:06:03<13:09:19, 12.29s/it] + +{'loss': 0.4631, 'learning_rate': 1.1200002758938161e-05, 'epoch': 0.48} + + 48%|████▊ | 3526/7378 [12:06:03<13:09:19, 12.29s/it] + 48%|████▊ | 3527/7378 [12:06:15<13:08:47, 12.29s/it] + +{'loss': 0.4652, 'learning_rate': 1.1195644215460766e-05, 'epoch': 0.48} + + 48%|████▊ | 3527/7378 [12:06:15<13:08:47, 12.29s/it] + 48%|████▊ | 3528/7378 [12:06:28<13:17:02, 12.42s/it] + +{'loss': 0.451, 'learning_rate': 1.1191285441541687e-05, 'epoch': 0.48} + + 48%|████▊ | 3528/7378 [12:06:28<13:17:02, 12.42s/it] + 48%|████▊ | 3529/7378 [12:06:40<13:09:50, 12.31s/it] + +{'loss': 0.5238, 'learning_rate': 1.1186926438021006e-05, 'epoch': 0.48} + + 48%|████▊ | 3529/7378 [12:06:40<13:09:50, 12.31s/it] + 48%|████▊ | 3530/7378 [12:06:52<13:11:08, 12.34s/it] + +{'loss': 0.4756, 'learning_rate': 1.1182567205738856e-05, 'epoch': 0.48} + + 48%|████▊ | 3530/7378 [12:06:52<13:11:08, 12.34s/it] + 48%|████▊ | 3531/7378 [12:07:05<13:11:40, 12.35s/it] + +{'loss': 0.5097, 'learning_rate': 1.1178207745535415e-05, 'epoch': 0.48} + + 48%|████▊ | 3531/7378 [12:07:05<13:11:40, 12.35s/it] + 48%|████▊ | 3532/7378 [12:07:17<13:03:01, 12.22s/it] + +{'loss': 0.4672, 'learning_rate': 1.1173848058250889e-05, 'epoch': 0.48} + + 48%|████▊ | 3532/7378 [12:07:17<13:03:01, 12.22s/it] + 48%|████▊ | 3533/7378 [12:07:29<12:57:34, 12.13s/it] + +{'loss': 0.4378, 'learning_rate': 1.116948814472555e-05, 'epoch': 0.48} + + 48%|████▊ | 3533/7378 [12:07:29<12:57:34, 12.13s/it] + 48%|████▊ | 3534/7378 [12:07:41<13:02:00, 12.21s/it] + +{'loss': 0.5451, 'learning_rate': 1.1165128005799696e-05, 'epoch': 0.48} + + 48%|████▊ | 3534/7378 [12:07:41<13:02:00, 12.21s/it] + 48%|████▊ | 3535/7378 [12:07:54<13:07:46, 12.30s/it] + +{'loss': 0.4506, 'learning_rate': 1.1160767642313681e-05, 'epoch': 0.48} + + 48%|████▊ | 3535/7378 [12:07:54<13:07:46, 12.30s/it] + 48%|████▊ | 3536/7378 [12:08:06<13:09:49, 12.33s/it] + +{'loss': 0.4518, 'learning_rate': 1.1156407055107894e-05, 'epoch': 0.48} + + 48%|████▊ | 3536/7378 [12:08:06<13:09:49, 12.33s/it] + 48%|████▊ | 3537/7378 [12:08:18<13:05:11, 12.27s/it] + +{'loss': 0.3761, 'learning_rate': 1.1152046245022767e-05, 'epoch': 0.48} + + 48%|████▊ | 3537/7378 [12:08:18<13:05:11, 12.27s/it] + 48%|████▊ | 3538/7378 [12:08:30<13:04:57, 12.27s/it] + +{'loss': 0.4769, 'learning_rate': 1.1147685212898784e-05, 'epoch': 0.48} + + 48%|████▊ | 3538/7378 [12:08:30<13:04:57, 12.27s/it] + 48%|████▊ | 3539/7378 [12:08:43<13:06:15, 12.29s/it] + +{'loss': 0.5429, 'learning_rate': 1.114332395957646e-05, 'epoch': 0.48} + + 48%|████▊ | 3539/7378 [12:08:43<13:06:15, 12.29s/it] + 48%|████▊ | 3540/7378 [12:08:55<13:00:12, 12.20s/it] + +{'loss': 0.4573, 'learning_rate': 1.1138962485896363e-05, 'epoch': 0.48} + + 48%|████▊ | 3540/7378 [12:08:55<13:00:12, 12.20s/it] + 48%|████▊ | 3541/7378 [12:09:07<13:06:06, 12.29s/it] + +{'loss': 0.4615, 'learning_rate': 1.1134600792699092e-05, 'epoch': 0.48} + + 48%|████▊ | 3541/7378 [12:09:07<13:06:06, 12.29s/it] + 48%|████▊ | 3542/7378 [12:09:19<13:03:31, 12.26s/it] + +{'loss': 0.4694, 'learning_rate': 1.1130238880825306e-05, 'epoch': 0.48} + + 48%|████▊ | 3542/7378 [12:09:19<13:03:31, 12.26s/it] + 48%|████▊ | 3543/7378 [12:09:32<13:02:03, 12.24s/it] + +{'loss': 0.4412, 'learning_rate': 1.1125876751115686e-05, 'epoch': 0.48} + + 48%|████▊ | 3543/7378 [12:09:32<13:02:03, 12.24s/it] + 48%|████▊ | 3544/7378 [12:09:44<12:59:04, 12.19s/it] + +{'loss': 0.4301, 'learning_rate': 1.1121514404410965e-05, 'epoch': 0.48} + + 48%|████▊ | 3544/7378 [12:09:44<12:59:04, 12.19s/it] + 48%|████▊ | 3545/7378 [12:09:56<12:56:55, 12.16s/it] + +{'loss': 0.4402, 'learning_rate': 1.111715184155192e-05, 'epoch': 0.48} + + 48%|████▊ | 3545/7378 [12:09:56<12:56:55, 12.16s/it] + 48%|████▊ | 3546/7378 [12:10:08<12:54:34, 12.13s/it] + +{'loss': 0.5041, 'learning_rate': 1.1112789063379362e-05, 'epoch': 0.48} + + 48%|████▊ | 3546/7378 [12:10:08<12:54:34, 12.13s/it] + 48%|████▊ | 3547/7378 [12:10:20<12:54:14, 12.13s/it] + +{'loss': 0.4713, 'learning_rate': 1.1108426070734156e-05, 'epoch': 0.48} + + 48%|████▊ | 3547/7378 [12:10:20<12:54:14, 12.13s/it] + 48%|████▊ | 3548/7378 [12:10:32<12:59:33, 12.21s/it] + +{'loss': 0.3666, 'learning_rate': 1.110406286445719e-05, 'epoch': 0.48} + + 48%|████▊ | 3548/7378 [12:10:32<12:59:33, 12.21s/it] + 48%|████▊ | 3549/7378 [12:10:45<13:01:11, 12.24s/it] + +{'loss': 0.4849, 'learning_rate': 1.1099699445389416e-05, 'epoch': 0.48} + + 48%|████▊ | 3549/7378 [12:10:45<13:01:11, 12.24s/it] + 48%|████▊ | 3550/7378 [12:10:57<13:09:34, 12.38s/it] + +{'loss': 0.3963, 'learning_rate': 1.1095335814371803e-05, 'epoch': 0.48} + + 48%|████▊ | 3550/7378 [12:10:57<13:09:34, 12.38s/it] + 48%|████▊ | 3551/7378 [12:11:09<13:05:29, 12.32s/it] + +{'loss': 0.4738, 'learning_rate': 1.109097197224538e-05, 'epoch': 0.48} + + 48%|████▊ | 3551/7378 [12:11:09<13:05:29, 12.32s/it] + 48%|████▊ | 3552/7378 [12:11:22<13:10:30, 12.40s/it] + +{'loss': 0.4976, 'learning_rate': 1.1086607919851205e-05, 'epoch': 0.48} + + 48%|████▊ | 3552/7378 [12:11:22<13:10:30, 12.40s/it] + 48%|████▊ | 3553/7378 [12:11:34<13:02:55, 12.28s/it] + +{'loss': 0.4627, 'learning_rate': 1.1082243658030382e-05, 'epoch': 0.48} + + 48%|████▊ | 3553/7378 [12:11:34<13:02:55, 12.28s/it] + 48%|████▊ | 3554/7378 [12:11:46<13:05:00, 12.32s/it] + +{'loss': 0.4789, 'learning_rate': 1.107787918762406e-05, 'epoch': 0.48} + + 48%|████▊ | 3554/7378 [12:11:46<13:05:00, 12.32s/it] + 48%|████▊ | 3555/7378 [12:11:59<13:15:33, 12.49s/it] + +{'loss': 0.517, 'learning_rate': 1.107351450947341e-05, 'epoch': 0.48} + + 48%|████▊ | 3555/7378 [12:11:59<13:15:33, 12.49s/it] + 48%|████▊ | 3556/7378 [12:12:12<13:14:35, 12.47s/it] + +{'loss': 0.4986, 'learning_rate': 1.1069149624419666e-05, 'epoch': 0.48} + + 48%|████▊ | 3556/7378 [12:12:12<13:14:35, 12.47s/it] + 48%|████▊ | 3557/7378 [12:12:24<13:07:16, 12.36s/it] + +{'loss': 0.3525, 'learning_rate': 1.1064784533304087e-05, 'epoch': 0.48} + + 48%|████▊ | 3557/7378 [12:12:24<13:07:16, 12.36s/it] + 48%|████▊ | 3558/7378 [12:12:36<13:05:02, 12.33s/it] + +{'loss': 0.458, 'learning_rate': 1.1060419236967974e-05, 'epoch': 0.48} + + 48%|████▊ | 3558/7378 [12:12:36<13:05:02, 12.33s/it] + 48%|████▊ | 3559/7378 [12:12:48<13:04:29, 12.32s/it] + +{'loss': 0.4783, 'learning_rate': 1.1056053736252675e-05, 'epoch': 0.48} + + 48%|████▊ | 3559/7378 [12:12:48<13:04:29, 12.32s/it] + 48%|████▊ | 3560/7378 [12:13:00<12:57:12, 12.21s/it] + +{'loss': 0.445, 'learning_rate': 1.1051688031999565e-05, 'epoch': 0.48} + + 48%|████▊ | 3560/7378 [12:13:00<12:57:12, 12.21s/it] + 48%|████▊ | 3561/7378 [12:13:12<12:53:45, 12.16s/it] + +{'loss': 0.4633, 'learning_rate': 1.1047322125050071e-05, 'epoch': 0.48} + + 48%|████▊ | 3561/7378 [12:13:12<12:53:45, 12.16s/it] + 48%|████▊ | 3562/7378 [12:13:25<12:59:30, 12.26s/it] + +{'loss': 0.4823, 'learning_rate': 1.104295601624565e-05, 'epoch': 0.48} + + 48%|████▊ | 3562/7378 [12:13:25<12:59:30, 12.26s/it] + 48%|████▊ | 3563/7378 [12:13:37<13:01:10, 12.29s/it] + +{'loss': 0.4785, 'learning_rate': 1.1038589706427802e-05, 'epoch': 0.48} + + 48%|████▊ | 3563/7378 [12:13:37<13:01:10, 12.29s/it] + 48%|████▊ | 3564/7378 [12:13:50<13:01:43, 12.30s/it] + +{'loss': 0.4722, 'learning_rate': 1.1034223196438065e-05, 'epoch': 0.48} + + 48%|████▊ | 3564/7378 [12:13:50<13:01:43, 12.30s/it] + 48%|████▊ | 3565/7378 [12:14:02<12:57:17, 12.23s/it] + +{'loss': 0.3906, 'learning_rate': 1.1029856487118013e-05, 'epoch': 0.48} + + 48%|████▊ | 3565/7378 [12:14:02<12:57:17, 12.23s/it] + 48%|████▊ | 3566/7378 [12:14:14<12:54:50, 12.20s/it] + +{'loss': 0.3775, 'learning_rate': 1.1025489579309265e-05, 'epoch': 0.48} + + 48%|████▊ | 3566/7378 [12:14:14<12:54:50, 12.20s/it] + 48%|████▊ | 3567/7378 [12:14:26<12:54:16, 12.19s/it] + +{'loss': 0.419, 'learning_rate': 1.1021122473853469e-05, 'epoch': 0.48} + + 48%|████▊ | 3567/7378 [12:14:26<12:54:16, 12.19s/it] + 48%|████▊ | 3568/7378 [12:14:38<12:54:24, 12.20s/it] + +{'loss': 0.4185, 'learning_rate': 1.1016755171592322e-05, 'epoch': 0.48} + + 48%|████▊ | 3568/7378 [12:14:38<12:54:24, 12.20s/it] + 48%|████▊ | 3569/7378 [12:14:50<12:49:56, 12.13s/it] + +{'loss': 0.481, 'learning_rate': 1.1012387673367547e-05, 'epoch': 0.48} + + 48%|████▊ | 3569/7378 [12:14:50<12:49:56, 12.13s/it] + 48%|████▊ | 3570/7378 [12:15:02<12:52:58, 12.18s/it] + +{'loss': 0.3806, 'learning_rate': 1.1008019980020917e-05, 'epoch': 0.48} + + 48%|████▊ | 3570/7378 [12:15:02<12:52:58, 12.18s/it] + 48%|████▊ | 3571/7378 [12:15:15<12:56:04, 12.23s/it] + +{'loss': 0.4863, 'learning_rate': 1.1003652092394228e-05, 'epoch': 0.48} + + 48%|████▊ | 3571/7378 [12:15:15<12:56:04, 12.23s/it] + 48%|████▊ | 3572/7378 [12:15:27<12:57:50, 12.26s/it] + +{'loss': 0.4806, 'learning_rate': 1.099928401132933e-05, 'epoch': 0.48} + + 48%|████▊ | 3572/7378 [12:15:27<12:57:50, 12.26s/it] + 48%|████▊ | 3573/7378 [12:15:40<13:12:21, 12.49s/it] + +{'loss': 0.5129, 'learning_rate': 1.0994915737668102e-05, 'epoch': 0.48} + + 48%|████▊ | 3573/7378 [12:15:40<13:12:21, 12.49s/it] + 48%|████▊ | 3574/7378 [12:15:52<13:05:48, 12.39s/it] + +{'loss': 0.4252, 'learning_rate': 1.0990547272252454e-05, 'epoch': 0.48} + + 48%|████▊ | 3574/7378 [12:15:52<13:05:48, 12.39s/it] + 48%|████▊ | 3575/7378 [12:16:04<12:59:09, 12.29s/it] + +{'loss': 0.4035, 'learning_rate': 1.0986178615924346e-05, 'epoch': 0.48} + + 48%|████▊ | 3575/7378 [12:16:04<12:59:09, 12.29s/it] + 48%|████▊ | 3576/7378 [12:16:17<12:59:50, 12.31s/it] + +{'loss': 0.4871, 'learning_rate': 1.098180976952576e-05, 'epoch': 0.48} + + 48%|████▊ | 3576/7378 [12:16:17<12:59:50, 12.31s/it] + 48%|████▊ | 3577/7378 [12:16:29<13:01:16, 12.33s/it] + +{'loss': 0.4808, 'learning_rate': 1.0977440733898733e-05, 'epoch': 0.48} + + 48%|████▊ | 3577/7378 [12:16:29<13:01:16, 12.33s/it] + 48%|████▊ | 3578/7378 [12:16:42<13:04:52, 12.39s/it] + +{'loss': 0.4567, 'learning_rate': 1.097307150988532e-05, 'epoch': 0.48} + + 48%|████▊ | 3578/7378 [12:16:42<13:04:52, 12.39s/it] + 49%|████▊ | 3579/7378 [12:16:54<13:05:30, 12.41s/it] + +{'loss': 0.4684, 'learning_rate': 1.0968702098327624e-05, 'epoch': 0.49} + + 49%|████▊ | 3579/7378 [12:16:54<13:05:30, 12.41s/it] + 49%|████▊ | 3580/7378 [12:17:06<12:58:07, 12.29s/it] + +{'loss': 0.4245, 'learning_rate': 1.096433250006778e-05, 'epoch': 0.49} + + 49%|████▊ | 3580/7378 [12:17:06<12:58:07, 12.29s/it] + 49%|████▊ | 3581/7378 [12:17:18<12:53:43, 12.23s/it] + +{'loss': 0.4757, 'learning_rate': 1.0959962715947956e-05, 'epoch': 0.49} + + 49%|████▊ | 3581/7378 [12:17:18<12:53:43, 12.23s/it] + 49%|████▊ | 3582/7378 [12:17:31<12:58:02, 12.30s/it] + +{'loss': 0.465, 'learning_rate': 1.0955592746810366e-05, 'epoch': 0.49} + + 49%|████▊ | 3582/7378 [12:17:31<12:58:02, 12.30s/it] + 49%|████▊ | 3583/7378 [12:17:43<12:57:45, 12.30s/it] + +{'loss': 0.4381, 'learning_rate': 1.0951222593497248e-05, 'epoch': 0.49} + + 49%|████▊ | 3583/7378 [12:17:43<12:57:45, 12.30s/it] + 49%|████▊ | 3584/7378 [12:17:55<12:52:47, 12.22s/it] + +{'loss': 0.4955, 'learning_rate': 1.0946852256850887e-05, 'epoch': 0.49} + + 49%|████▊ | 3584/7378 [12:17:55<12:52:47, 12.22s/it] + 49%|████▊ | 3585/7378 [12:18:07<12:48:52, 12.16s/it] + +{'loss': 0.4843, 'learning_rate': 1.0942481737713588e-05, 'epoch': 0.49} + + 49%|████▊ | 3585/7378 [12:18:07<12:48:52, 12.16s/it] + 49%|████▊ | 3586/7378 [12:18:20<12:55:21, 12.27s/it] + +{'loss': 0.5033, 'learning_rate': 1.0938111036927705e-05, 'epoch': 0.49} + + 49%|████▊ | 3586/7378 [12:18:20<12:55:21, 12.27s/it] + 49%|████▊ | 3587/7378 [12:18:32<12:58:09, 12.32s/it] + +{'loss': 0.4552, 'learning_rate': 1.0933740155335622e-05, 'epoch': 0.49} + + 49%|████▊ | 3587/7378 [12:18:32<12:58:09, 12.32s/it] + 49%|████▊ | 3588/7378 [12:18:44<13:00:26, 12.36s/it] + +{'loss': 0.4266, 'learning_rate': 1.0929369093779755e-05, 'epoch': 0.49} + + 49%|████▊ | 3588/7378 [12:18:44<13:00:26, 12.36s/it] + 49%|████▊ | 3589/7378 [12:18:57<12:58:27, 12.33s/it] + +{'loss': 0.4543, 'learning_rate': 1.0924997853102563e-05, 'epoch': 0.49} + + 49%|████▊ | 3589/7378 [12:18:57<12:58:27, 12.33s/it] + 49%|████▊ | 3590/7378 [12:19:09<13:04:51, 12.43s/it] + +{'loss': 0.5297, 'learning_rate': 1.0920626434146528e-05, 'epoch': 0.49} + + 49%|████▊ | 3590/7378 [12:19:09<13:04:51, 12.43s/it] + 49%|████▊ | 3591/7378 [12:19:22<13:07:03, 12.47s/it] + +{'loss': 0.4753, 'learning_rate': 1.091625483775418e-05, 'epoch': 0.49} + + 49%|████▊ | 3591/7378 [12:19:22<13:07:03, 12.47s/it] + 49%|████▊ | 3592/7378 [12:19:34<12:59:06, 12.35s/it] + +{'loss': 0.4771, 'learning_rate': 1.0911883064768068e-05, 'epoch': 0.49} + + 49%|████▊ | 3592/7378 [12:19:34<12:59:06, 12.35s/it] + 49%|████▊ | 3593/7378 [12:19:46<12:53:27, 12.26s/it] + +{'loss': 0.53, 'learning_rate': 1.0907511116030785e-05, 'epoch': 0.49} + + 49%|████▊ | 3593/7378 [12:19:46<12:53:27, 12.26s/it] + 49%|████▊ | 3594/7378 [12:19:58<12:51:41, 12.24s/it] + +{'loss': 0.4654, 'learning_rate': 1.0903138992384961e-05, 'epoch': 0.49} + + 49%|████▊ | 3594/7378 [12:19:58<12:51:41, 12.24s/it] + 49%|████▊ | 3595/7378 [12:20:10<12:46:51, 12.16s/it] + +{'loss': 0.4582, 'learning_rate': 1.0898766694673247e-05, 'epoch': 0.49} + + 49%|████▊ | 3595/7378 [12:20:10<12:46:51, 12.16s/it] + 49%|████▊ | 3596/7378 [12:20:23<12:52:18, 12.25s/it] + +{'loss': 0.5441, 'learning_rate': 1.0894394223738338e-05, 'epoch': 0.49} + + 49%|████▊ | 3596/7378 [12:20:23<12:52:18, 12.25s/it] + 49%|████▉ | 3597/7378 [12:20:35<12:57:55, 12.34s/it] + +{'loss': 0.4255, 'learning_rate': 1.0890021580422957e-05, 'epoch': 0.49} + + 49%|████▉ | 3597/7378 [12:20:35<12:57:55, 12.34s/it] + 49%|████▉ | 3598/7378 [12:20:47<12:55:05, 12.30s/it] + +{'loss': 0.4534, 'learning_rate': 1.0885648765569868e-05, 'epoch': 0.49} + + 49%|████▉ | 3598/7378 [12:20:47<12:55:05, 12.30s/it] + 49%|████▉ | 3599/7378 [12:21:00<12:52:14, 12.26s/it] + +{'loss': 0.4901, 'learning_rate': 1.0881275780021859e-05, 'epoch': 0.49} + + 49%|████▉ | 3599/7378 [12:21:00<12:52:14, 12.26s/it] + 49%|████▉ | 3600/7378 [12:21:12<12:49:15, 12.22s/it] + +{'loss': 0.4756, 'learning_rate': 1.0876902624621753e-05, 'epoch': 0.49} + + 49%|████▉ | 3600/7378 [12:21:12<12:49:15, 12.22s/it] + 49%|████▉ | 3601/7378 [12:21:24<12:57:03, 12.34s/it] + +{'loss': 0.4344, 'learning_rate': 1.087252930021241e-05, 'epoch': 0.49} + + 49%|████▉ | 3601/7378 [12:21:24<12:57:03, 12.34s/it] + 49%|████▉ | 3602/7378 [12:21:36<12:49:45, 12.23s/it] + +{'loss': 0.478, 'learning_rate': 1.0868155807636715e-05, 'epoch': 0.49} + + 49%|████▉ | 3602/7378 [12:21:36<12:49:45, 12.23s/it] + 49%|████▉ | 3603/7378 [12:21:49<12:49:00, 12.22s/it] + +{'loss': 0.3884, 'learning_rate': 1.0863782147737598e-05, 'epoch': 0.49} + + 49%|████▉ | 3603/7378 [12:21:49<12:49:00, 12.22s/it] + 49%|████▉ | 3604/7378 [12:22:01<12:46:57, 12.19s/it] + +{'loss': 0.4531, 'learning_rate': 1.0859408321358005e-05, 'epoch': 0.49} + + 49%|████▉ | 3604/7378 [12:22:01<12:46:57, 12.19s/it] + 49%|████▉ | 3605/7378 [12:22:14<12:59:57, 12.40s/it] + +{'loss': 0.5179, 'learning_rate': 1.085503432934093e-05, 'epoch': 0.49} + + 49%|████▉ | 3605/7378 [12:22:14<12:59:57, 12.40s/it] + 49%|████▉ | 3606/7378 [12:22:26<12:59:19, 12.40s/it] + +{'loss': 0.4869, 'learning_rate': 1.0850660172529383e-05, 'epoch': 0.49} + + 49%|████▉ | 3606/7378 [12:22:26<12:59:19, 12.40s/it] + 49%|████▉ | 3607/7378 [12:22:38<12:59:51, 12.41s/it] + +{'loss': 0.4488, 'learning_rate': 1.0846285851766425e-05, 'epoch': 0.49} + + 49%|████▉ | 3607/7378 [12:22:38<12:59:51, 12.41s/it] + 49%|████▉ | 3608/7378 [12:22:51<12:55:44, 12.35s/it] + +{'loss': 0.4949, 'learning_rate': 1.084191136789513e-05, 'epoch': 0.49} + + 49%|████▉ | 3608/7378 [12:22:51<12:55:44, 12.35s/it] + 49%|████▉ | 3609/7378 [12:23:02<12:44:42, 12.17s/it] + +{'loss': 0.4626, 'learning_rate': 1.083753672175861e-05, 'epoch': 0.49} + + 49%|████▉ | 3609/7378 [12:23:02<12:44:42, 12.17s/it] + 49%|████▉ | 3610/7378 [12:23:15<12:50:29, 12.27s/it] + +{'loss': 0.4503, 'learning_rate': 1.0833161914200017e-05, 'epoch': 0.49} + + 49%|████▉ | 3610/7378 [12:23:15<12:50:29, 12.27s/it] + 49%|████▉ | 3611/7378 [12:23:27<12:45:12, 12.19s/it] + +{'loss': 0.4467, 'learning_rate': 1.0828786946062517e-05, 'epoch': 0.49} + + 49%|████▉ | 3611/7378 [12:23:27<12:45:12, 12.19s/it] + 49%|████▉ | 3612/7378 [12:23:39<12:45:14, 12.19s/it] + +{'loss': 0.417, 'learning_rate': 1.0824411818189327e-05, 'epoch': 0.49} + + 49%|████▉ | 3612/7378 [12:23:39<12:45:14, 12.19s/it] + 49%|████▉ | 3613/7378 [12:23:51<12:43:23, 12.17s/it] + +{'loss': 0.4033, 'learning_rate': 1.0820036531423675e-05, 'epoch': 0.49} + + 49%|████▉ | 3613/7378 [12:23:51<12:43:23, 12.17s/it] + 49%|████▉ | 3614/7378 [12:24:03<12:40:34, 12.12s/it] + +{'loss': 0.4445, 'learning_rate': 1.0815661086608835e-05, 'epoch': 0.49} + + 49%|████▉ | 3614/7378 [12:24:03<12:40:34, 12.12s/it] + 49%|████▉ | 3615/7378 [12:24:15<12:44:47, 12.19s/it] + +{'loss': 0.5365, 'learning_rate': 1.0811285484588101e-05, 'epoch': 0.49} + + 49%|████▉ | 3615/7378 [12:24:15<12:44:47, 12.19s/it] + 49%|████▉ | 3616/7378 [12:24:28<12:49:23, 12.27s/it] + +{'loss': 0.5111, 'learning_rate': 1.0806909726204805e-05, 'epoch': 0.49} + + 49%|████▉ | 3616/7378 [12:24:28<12:49:23, 12.27s/it] + 49%|████▉ | 3617/7378 [12:24:40<12:53:50, 12.35s/it] + +{'loss': 0.409, 'learning_rate': 1.0802533812302305e-05, 'epoch': 0.49} + + 49%|████▉ | 3617/7378 [12:24:40<12:53:50, 12.35s/it] + 49%|████▉ | 3618/7378 [12:24:53<12:52:21, 12.32s/it] + +{'loss': 0.4323, 'learning_rate': 1.079815774372399e-05, 'epoch': 0.49} + + 49%|████▉ | 3618/7378 [12:24:53<12:52:21, 12.32s/it] + 49%|████▉ | 3619/7378 [12:25:05<12:50:05, 12.29s/it] + +{'loss': 0.4477, 'learning_rate': 1.079378152131328e-05, 'epoch': 0.49} + + 49%|████▉ | 3619/7378 [12:25:05<12:50:05, 12.29s/it] + 49%|████▉ | 3620/7378 [12:25:17<12:53:17, 12.35s/it] + +{'loss': 0.4735, 'learning_rate': 1.078940514591362e-05, 'epoch': 0.49} + + 49%|████▉ | 3620/7378 [12:25:17<12:53:17, 12.35s/it] + 49%|████▉ | 3621/7378 [12:25:30<12:55:34, 12.39s/it] + +{'loss': 0.5197, 'learning_rate': 1.078502861836849e-05, 'epoch': 0.49} + + 49%|████▉ | 3621/7378 [12:25:30<12:55:34, 12.39s/it] + 49%|████▉ | 3622/7378 [12:25:43<12:59:49, 12.46s/it] + +{'loss': 0.4133, 'learning_rate': 1.0780651939521396e-05, 'epoch': 0.49} + + 49%|████▉ | 3622/7378 [12:25:43<12:59:49, 12.46s/it] + 49%|████▉ | 3623/7378 [12:25:55<12:56:45, 12.41s/it] + +{'loss': 0.4521, 'learning_rate': 1.0776275110215875e-05, 'epoch': 0.49} + + 49%|████▉ | 3623/7378 [12:25:55<12:56:45, 12.41s/it] + 49%|████▉ | 3624/7378 [12:26:07<12:54:56, 12.39s/it] + +{'loss': 0.4123, 'learning_rate': 1.0771898131295493e-05, 'epoch': 0.49} + + 49%|████▉ | 3624/7378 [12:26:07<12:54:56, 12.39s/it] + 49%|████▉ | 3625/7378 [12:26:19<12:49:09, 12.30s/it] + +{'loss': 0.4616, 'learning_rate': 1.076752100360384e-05, 'epoch': 0.49} + + 49%|████▉ | 3625/7378 [12:26:19<12:49:09, 12.30s/it] + 49%|████▉ | 3626/7378 [12:26:31<12:46:37, 12.26s/it] + +{'loss': 0.374, 'learning_rate': 1.0763143727984546e-05, 'epoch': 0.49} + + 49%|████▉ | 3626/7378 [12:26:31<12:46:37, 12.26s/it] + 49%|████▉ | 3627/7378 [12:26:44<12:47:20, 12.27s/it] + +{'loss': 0.4964, 'learning_rate': 1.0758766305281257e-05, 'epoch': 0.49} + + 49%|████▉ | 3627/7378 [12:26:44<12:47:20, 12.27s/it] + 49%|████▉ | 3628/7378 [12:26:56<12:48:21, 12.29s/it] + +{'loss': 0.4179, 'learning_rate': 1.0754388736337652e-05, 'epoch': 0.49} + + 49%|████▉ | 3628/7378 [12:26:56<12:48:21, 12.29s/it] + 49%|████▉ | 3629/7378 [12:27:08<12:49:49, 12.32s/it] + +{'loss': 0.3955, 'learning_rate': 1.0750011021997444e-05, 'epoch': 0.49} + + 49%|████▉ | 3629/7378 [12:27:08<12:49:49, 12.32s/it] + 49%|████▉ | 3630/7378 [12:27:21<12:47:02, 12.28s/it] + +{'loss': 0.4397, 'learning_rate': 1.074563316310436e-05, 'epoch': 0.49} + + 49%|████▉ | 3630/7378 [12:27:21<12:47:02, 12.28s/it] + 49%|████▉ | 3631/7378 [12:27:33<12:46:32, 12.27s/it] + +{'loss': 0.4111, 'learning_rate': 1.0741255160502176e-05, 'epoch': 0.49} + + 49%|████▉ | 3631/7378 [12:27:33<12:46:32, 12.27s/it] + 49%|████▉ | 3632/7378 [12:27:45<12:44:46, 12.25s/it] + +{'loss': 0.4153, 'learning_rate': 1.073687701503467e-05, 'epoch': 0.49} + + 49%|████▉ | 3632/7378 [12:27:45<12:44:46, 12.25s/it] + 49%|████▉ | 3633/7378 [12:27:57<12:41:52, 12.21s/it] + +{'loss': 0.4597, 'learning_rate': 1.0732498727545672e-05, 'epoch': 0.49} + + 49%|████▉ | 3633/7378 [12:27:57<12:41:52, 12.21s/it] + 49%|████▉ | 3634/7378 [12:28:10<12:59:13, 12.49s/it] + +{'loss': 0.4782, 'learning_rate': 1.072812029887902e-05, 'epoch': 0.49} + + 49%|████▉ | 3634/7378 [12:28:10<12:59:13, 12.49s/it] + 49%|████▉ | 3635/7378 [12:28:24<13:12:42, 12.71s/it] + +{'loss': 0.4652, 'learning_rate': 1.0723741729878596e-05, 'epoch': 0.49} + + 49%|████▉ | 3635/7378 [12:28:24<13:12:42, 12.71s/it] + 49%|████▉ | 3636/7378 [12:28:35<12:55:30, 12.43s/it] + +{'loss': 0.3962, 'learning_rate': 1.0719363021388292e-05, 'epoch': 0.49} + + 49%|████▉ | 3636/7378 [12:28:35<12:55:30, 12.43s/it] + 49%|████▉ | 3637/7378 [12:28:48<12:54:04, 12.41s/it] + +{'loss': 0.4378, 'learning_rate': 1.071498417425204e-05, 'epoch': 0.49} + + 49%|████▉ | 3637/7378 [12:28:48<12:54:04, 12.41s/it] + 49%|████▉ | 3638/7378 [12:29:00<12:54:44, 12.43s/it] + +{'loss': 0.4936, 'learning_rate': 1.0710605189313794e-05, 'epoch': 0.49} + + 49%|████▉ | 3638/7378 [12:29:00<12:54:44, 12.43s/it] + 49%|████▉ | 3639/7378 [12:29:12<12:51:22, 12.38s/it] + +{'loss': 0.515, 'learning_rate': 1.0706226067417533e-05, 'epoch': 0.49} + + 49%|████▉ | 3639/7378 [12:29:12<12:51:22, 12.38s/it] + 49%|████▉ | 3640/7378 [12:29:25<12:45:16, 12.28s/it] + +{'loss': 0.4716, 'learning_rate': 1.0701846809407268e-05, 'epoch': 0.49} + + 49%|████▉ | 3640/7378 [12:29:25<12:45:16, 12.28s/it] + 49%|████▉ | 3641/7378 [12:29:37<12:49:35, 12.36s/it] + +{'loss': 0.4623, 'learning_rate': 1.0697467416127028e-05, 'epoch': 0.49} + + 49%|████▉ | 3641/7378 [12:29:37<12:49:35, 12.36s/it] + 49%|████▉ | 3642/7378 [12:29:50<12:55:52, 12.46s/it] + +{'loss': 0.4302, 'learning_rate': 1.0693087888420875e-05, 'epoch': 0.49} + + 49%|████▉ | 3642/7378 [12:29:50<12:55:52, 12.46s/it] + 49%|████▉ | 3643/7378 [12:30:02<12:48:36, 12.35s/it] + +{'loss': 0.4678, 'learning_rate': 1.0688708227132891e-05, 'epoch': 0.49} + + 49%|████▉ | 3643/7378 [12:30:02<12:48:36, 12.35s/it] + 49%|████▉ | 3644/7378 [12:30:14<12:53:00, 12.42s/it] + +{'loss': 0.4654, 'learning_rate': 1.0684328433107192e-05, 'epoch': 0.49} + + 49%|████▉ | 3644/7378 [12:30:14<12:53:00, 12.42s/it] + 49%|████▉ | 3645/7378 [12:30:27<12:47:39, 12.34s/it] + +{'loss': 0.4212, 'learning_rate': 1.0679948507187912e-05, 'epoch': 0.49} + + 49%|████▉ | 3645/7378 [12:30:27<12:47:39, 12.34s/it] + 49%|████▉ | 3646/7378 [12:30:39<12:40:02, 12.22s/it] + +{'loss': 0.4336, 'learning_rate': 1.0675568450219208e-05, 'epoch': 0.49} + + 49%|████▉ | 3646/7378 [12:30:39<12:40:02, 12.22s/it] + 49%|████▉ | 3647/7378 [12:30:51<12:46:31, 12.33s/it] + +{'loss': 0.4682, 'learning_rate': 1.067118826304528e-05, 'epoch': 0.49} + + 49%|████▉ | 3647/7378 [12:30:51<12:46:31, 12.33s/it] + 49%|████▉ | 3648/7378 [12:31:03<12:39:02, 12.21s/it] + +{'loss': 0.4691, 'learning_rate': 1.0666807946510326e-05, 'epoch': 0.49} + + 49%|████▉ | 3648/7378 [12:31:03<12:39:02, 12.21s/it] + 49%|████▉ | 3649/7378 [12:31:15<12:39:30, 12.22s/it] + +{'loss': 0.4882, 'learning_rate': 1.0662427501458596e-05, 'epoch': 0.49} + + 49%|████▉ | 3649/7378 [12:31:15<12:39:30, 12.22s/it] + 49%|████▉ | 3650/7378 [12:31:27<12:38:13, 12.20s/it] + +{'loss': 0.4722, 'learning_rate': 1.0658046928734346e-05, 'epoch': 0.49} + + 49%|████▉ | 3650/7378 [12:31:27<12:38:13, 12.20s/it] + 49%|████▉ | 3651/7378 [12:31:40<12:40:23, 12.24s/it] + +{'loss': 0.4339, 'learning_rate': 1.065366622918186e-05, 'epoch': 0.49} + + 49%|████▉ | 3651/7378 [12:31:40<12:40:23, 12.24s/it] + 49%|████▉ | 3652/7378 [12:31:52<12:36:10, 12.18s/it] + +{'loss': 0.462, 'learning_rate': 1.0649285403645456e-05, 'epoch': 0.49} + + 49%|████▉ | 3652/7378 [12:31:52<12:36:10, 12.18s/it] + 50%|████▉ | 3653/7378 [12:32:04<12:33:32, 12.14s/it] + +{'loss': 0.4305, 'learning_rate': 1.0644904452969462e-05, 'epoch': 0.5} + + 50%|████▉ | 3653/7378 [12:32:04<12:33:32, 12.14s/it] + 50%|████▉ | 3654/7378 [12:32:16<12:40:15, 12.25s/it] + +{'loss': 0.473, 'learning_rate': 1.0640523377998245e-05, 'epoch': 0.5} + + 50%|████▉ | 3654/7378 [12:32:16<12:40:15, 12.25s/it] + 50%|████▉ | 3655/7378 [12:32:28<12:33:35, 12.15s/it] + +{'loss': 0.4717, 'learning_rate': 1.0636142179576182e-05, 'epoch': 0.5} + + 50%|████▉ | 3655/7378 [12:32:28<12:33:35, 12.15s/it] + 50%|████▉ | 3656/7378 [12:32:40<12:34:46, 12.17s/it] + +{'loss': 0.393, 'learning_rate': 1.0631760858547687e-05, 'epoch': 0.5} + + 50%|████▉ | 3656/7378 [12:32:40<12:34:46, 12.17s/it] + 50%|████▉ | 3657/7378 [12:32:53<12:38:11, 12.23s/it] + +{'loss': 0.4617, 'learning_rate': 1.0627379415757183e-05, 'epoch': 0.5} + + 50%|████▉ | 3657/7378 [12:32:53<12:38:11, 12.23s/it] + 50%|████▉ | 3658/7378 [12:33:05<12:39:36, 12.25s/it] + +{'loss': 0.5221, 'learning_rate': 1.062299785204913e-05, 'epoch': 0.5} + + 50%|████▉ | 3658/7378 [12:33:05<12:39:36, 12.25s/it] + 50%|████▉ | 3659/7378 [12:33:17<12:33:21, 12.15s/it] + +{'loss': 0.4784, 'learning_rate': 1.0618616168268003e-05, 'epoch': 0.5} + + 50%|████▉ | 3659/7378 [12:33:17<12:33:21, 12.15s/it] + 50%|████▉ | 3660/7378 [12:33:30<12:41:09, 12.28s/it] + +{'loss': 0.4917, 'learning_rate': 1.0614234365258307e-05, 'epoch': 0.5} + + 50%|████▉ | 3660/7378 [12:33:30<12:41:09, 12.28s/it] + 50%|████▉ | 3661/7378 [12:33:42<12:38:08, 12.24s/it] + +{'loss': 0.4301, 'learning_rate': 1.0609852443864563e-05, 'epoch': 0.5} + + 50%|████▉ | 3661/7378 [12:33:42<12:38:08, 12.24s/it] + 50%|████▉ | 3662/7378 [12:33:54<12:43:52, 12.33s/it] + +{'loss': 0.4979, 'learning_rate': 1.0605470404931317e-05, 'epoch': 0.5} + + 50%|████▉ | 3662/7378 [12:33:54<12:43:52, 12.33s/it] + 50%|████▉ | 3663/7378 [12:34:06<12:40:29, 12.28s/it] + +{'loss': 0.4503, 'learning_rate': 1.060108824930314e-05, 'epoch': 0.5} + + 50%|████▉ | 3663/7378 [12:34:07<12:40:29, 12.28s/it] + 50%|█��██▉ | 3664/7378 [12:34:19<12:48:39, 12.42s/it] + +{'loss': 0.4278, 'learning_rate': 1.0596705977824624e-05, 'epoch': 0.5} + + 50%|████▉ | 3664/7378 [12:34:19<12:48:39, 12.42s/it] + 50%|████▉ | 3665/7378 [12:34:31<12:42:53, 12.33s/it] + +{'loss': 0.4637, 'learning_rate': 1.0592323591340378e-05, 'epoch': 0.5} + + 50%|████▉ | 3665/7378 [12:34:31<12:42:53, 12.33s/it] + 50%|████▉ | 3666/7378 [12:34:44<12:42:49, 12.33s/it] + +{'loss': 0.4842, 'learning_rate': 1.0587941090695046e-05, 'epoch': 0.5} + + 50%|████▉ | 3666/7378 [12:34:44<12:42:49, 12.33s/it] + 50%|████▉ | 3667/7378 [12:34:56<12:39:59, 12.29s/it] + +{'loss': 0.4325, 'learning_rate': 1.0583558476733282e-05, 'epoch': 0.5} + + 50%|████▉ | 3667/7378 [12:34:56<12:39:59, 12.29s/it] + 50%|████▉ | 3668/7378 [12:35:08<12:43:36, 12.35s/it] + +{'loss': 0.4725, 'learning_rate': 1.0579175750299769e-05, 'epoch': 0.5} + + 50%|████▉ | 3668/7378 [12:35:08<12:43:36, 12.35s/it] + 50%|████▉ | 3669/7378 [12:35:21<12:40:19, 12.30s/it] + +{'loss': 0.4391, 'learning_rate': 1.0574792912239203e-05, 'epoch': 0.5} + + 50%|████▉ | 3669/7378 [12:35:21<12:40:19, 12.30s/it] + 50%|████▉ | 3670/7378 [12:35:32<12:32:46, 12.18s/it] + +{'loss': 0.4238, 'learning_rate': 1.0570409963396313e-05, 'epoch': 0.5} + + 50%|████▉ | 3670/7378 [12:35:32<12:32:46, 12.18s/it] + 50%|████▉ | 3671/7378 [12:35:45<12:30:15, 12.14s/it] + +{'loss': 0.4877, 'learning_rate': 1.0566026904615844e-05, 'epoch': 0.5} + + 50%|████▉ | 3671/7378 [12:35:45<12:30:15, 12.14s/it] + 50%|████▉ | 3672/7378 [12:35:57<12:31:12, 12.16s/it] + +{'loss': 0.4694, 'learning_rate': 1.0561643736742556e-05, 'epoch': 0.5} + + 50%|████▉ | 3672/7378 [12:35:57<12:31:12, 12.16s/it] + 50%|████▉ | 3673/7378 [12:36:09<12:34:45, 12.22s/it] + +{'loss': 0.4655, 'learning_rate': 1.0557260460621242e-05, 'epoch': 0.5} + + 50%|████▉ | 3673/7378 [12:36:09<12:34:45, 12.22s/it] + 50%|████▉ | 3674/7378 [12:36:21<12:34:50, 12.23s/it] + +{'loss': 0.4909, 'learning_rate': 1.0552877077096706e-05, 'epoch': 0.5} + + 50%|████▉ | 3674/7378 [12:36:21<12:34:50, 12.23s/it] + 50%|████▉ | 3675/7378 [12:36:34<12:34:19, 12.22s/it] + +{'loss': 0.5101, 'learning_rate': 1.054849358701378e-05, 'epoch': 0.5} + + 50%|████▉ | 3675/7378 [12:36:34<12:34:19, 12.22s/it] + 50%|████▉ | 3676/7378 [12:36:46<12:33:36, 12.21s/it] + +{'loss': 0.4846, 'learning_rate': 1.0544109991217309e-05, 'epoch': 0.5} + + 50%|████▉ | 3676/7378 [12:36:46<12:33:36, 12.21s/it] + 50%|████▉ | 3677/7378 [12:36:58<12:34:03, 12.22s/it] + +{'loss': 0.5045, 'learning_rate': 1.0539726290552163e-05, 'epoch': 0.5} + + 50%|████▉ | 3677/7378 [12:36:58<12:34:03, 12.22s/it] + 50%|████▉ | 3678/7378 [12:37:11<12:41:53, 12.36s/it] + +{'loss': 0.42, 'learning_rate': 1.0535342485863235e-05, 'epoch': 0.5} + + 50%|████▉ | 3678/7378 [12:37:11<12:41:53, 12.36s/it] + 50%|████▉ | 3679/7378 [12:37:23<12:40:36, 12.34s/it] + +{'loss': 0.5279, 'learning_rate': 1.0530958577995434e-05, 'epoch': 0.5} + + 50%|████▉ | 3679/7378 [12:37:23<12:40:36, 12.34s/it] + 50%|████▉ | 3680/7378 [12:37:36<12:45:47, 12.43s/it] + +{'loss': 0.4855, 'learning_rate': 1.0526574567793687e-05, 'epoch': 0.5} + + 50%|████▉ | 3680/7378 [12:37:36<12:45:47, 12.43s/it] + 50%|████▉ | 3681/7378 [12:37:48<12:48:40, 12.48s/it] + +{'loss': 0.3918, 'learning_rate': 1.052219045610294e-05, 'epoch': 0.5} + + 50%|████▉ | 3681/7378 [12:37:48<12:48:40, 12.48s/it] + 50%|████▉ | 3682/7378 [12:38:00<12:41:27, 12.36s/it] + +{'loss': 0.4434, 'learning_rate': 1.0517806243768172e-05, 'epoch': 0.5} + + 50%|████▉ | 3682/7378 [12:38:00<12:41:27, 12.36s/it] + 50%|████▉ | 3683/7378 [12:38:13<12:46:50, 12.45s/it] + +{'loss': 0.465, 'learning_rate': 1.0513421931634362e-05, 'epoch': 0.5} + + 50%|████▉ | 3683/7378 [12:38:13<12:46:50, 12.45s/it] + 50%|████▉ | 3684/7378 [12:38:25<12:41:43, 12.37s/it] + +{'loss': 0.4877, 'learning_rate': 1.0509037520546524e-05, 'epoch': 0.5} + + 50%|████▉ | 3684/7378 [12:38:25<12:41:43, 12.37s/it] + 50%|████▉ | 3685/7378 [12:38:37<12:39:41, 12.34s/it] + +{'loss': 0.4801, 'learning_rate': 1.0504653011349678e-05, 'epoch': 0.5} + + 50%|████▉ | 3685/7378 [12:38:37<12:39:41, 12.34s/it] + 50%|████▉ | 3686/7378 [12:38:50<12:37:47, 12.32s/it] + +{'loss': 0.4725, 'learning_rate': 1.0500268404888873e-05, 'epoch': 0.5} + + 50%|████▉ | 3686/7378 [12:38:50<12:37:47, 12.32s/it] + 50%|████▉ | 3687/7378 [12:39:02<12:40:22, 12.36s/it] + +{'loss': 0.4005, 'learning_rate': 1.0495883702009178e-05, 'epoch': 0.5} + + 50%|████▉ | 3687/7378 [12:39:02<12:40:22, 12.36s/it] + 50%|████▉ | 3688/7378 [12:39:14<12:38:11, 12.33s/it] + +{'loss': 0.5001, 'learning_rate': 1.0491498903555667e-05, 'epoch': 0.5} + + 50%|████▉ | 3688/7378 [12:39:14<12:38:11, 12.33s/it] + 50%|█████ | 3689/7378 [12:39:27<12:35:39, 12.29s/it] + +{'loss': 0.4357, 'learning_rate': 1.0487114010373445e-05, 'epoch': 0.5} + + 50%|█████ | 3689/7378 [12:39:27<12:35:39, 12.29s/it] + 50%|█████ | 3690/7378 [12:39:39<12:36:00, 12.30s/it] + +{'loss': 0.4642, 'learning_rate': 1.048272902330763e-05, 'epoch': 0.5} + + 50%|█████ | 3690/7378 [12:39:39<12:36:00, 12.30s/it] + 50%|█████ | 3691/7378 [12:39:51<12:34:45, 12.28s/it] + +{'loss': 0.4176, 'learning_rate': 1.0478343943203364e-05, 'epoch': 0.5} + + 50%|█████ | 3691/7378 [12:39:51<12:34:45, 12.28s/it] + 50%|█████ | 3692/7378 [12:40:03<12:26:21, 12.15s/it] + +{'loss': 0.4492, 'learning_rate': 1.0473958770905797e-05, 'epoch': 0.5} + + 50%|█████ | 3692/7378 [12:40:03<12:26:21, 12.15s/it] + 50%|█████ | 3693/7378 [12:40:15<12:28:15, 12.18s/it] + +{'loss': 0.4438, 'learning_rate': 1.0469573507260107e-05, 'epoch': 0.5} + + 50%|█████ | 3693/7378 [12:40:15<12:28:15, 12.18s/it] + 50%|█████ | 3694/7378 [12:40:28<12:32:57, 12.26s/it] + +{'loss': 0.4415, 'learning_rate': 1.0465188153111481e-05, 'epoch': 0.5} + + 50%|█████ | 3694/7378 [12:40:28<12:32:57, 12.26s/it] + 50%|█████ | 3695/7378 [12:40:40<12:34:14, 12.29s/it] + +{'loss': 0.4976, 'learning_rate': 1.0460802709305126e-05, 'epoch': 0.5} + + 50%|█████ | 3695/7378 [12:40:40<12:34:14, 12.29s/it] + 50%|█████ | 3696/7378 [12:40:53<12:45:58, 12.48s/it] + +{'loss': 0.4316, 'learning_rate': 1.0456417176686275e-05, 'epoch': 0.5} + + 50%|█████ | 3696/7378 [12:40:53<12:45:58, 12.48s/it] + 50%|█████ | 3697/7378 [12:41:05<12:47:10, 12.50s/it] + +{'loss': 0.4779, 'learning_rate': 1.0452031556100165e-05, 'epoch': 0.5} + + 50%|█████ | 3697/7378 [12:41:05<12:47:10, 12.50s/it] + 50%|█████ | 3698/7378 [12:41:17<12:37:01, 12.34s/it] + +{'loss': 0.4504, 'learning_rate': 1.0447645848392057e-05, 'epoch': 0.5} + + 50%|█████ | 3698/7378 [12:41:17<12:37:01, 12.34s/it] + 50%|█████ | 3699/7378 [12:41:30<12:31:49, 12.26s/it] + +{'loss': 0.3817, 'learning_rate': 1.0443260054407224e-05, 'epoch': 0.5} + + 50%|█████ | 3699/7378 [12:41:30<12:31:49, 12.26s/it] + 50%|█████ | 3700/7378 [12:41:42<12:35:08, 12.32s/it] + +{'loss': 0.4895, 'learning_rate': 1.0438874174990966e-05, 'epoch': 0.5} + + 50%|█████ | 3700/7378 [12:41:42<12:35:08, 12.32s/it] + 50%|█████ | 3701/7378 [12:41:54<12:37:31, 12.36s/it] + +{'loss': 0.4793, 'learning_rate': 1.0434488210988587e-05, 'epoch': 0.5} + + 50%|█████ | 3701/7378 [12:41:54<12:37:31, 12.36s/it] + 50%|█████ | 3702/7378 [12:42:07<12:33:15, 12.29s/it] + +{'loss': 0.4664, 'learning_rate': 1.043010216324541e-05, 'epoch': 0.5} + + 50%|█████ | 3702/7378 [12:42:07<12:33:15, 12.29s/it] + 50%|█████ | 3703/7378 [12:42:19<12:36:51, 12.36s/it] + +{'loss': 0.3924, 'learning_rate': 1.0425716032606787e-05, 'epoch': 0.5} + + 50%|█████ | 3703/7378 [12:42:19<12:36:51, 12.36s/it] + 50%|█████ | 3704/7378 [12:42:31<12:32:49, 12.29s/it] + +{'loss': 0.4346, 'learning_rate': 1.042132981991807e-05, 'epoch': 0.5} + + 50%|█████ | 3704/7378 [12:42:31<12:32:49, 12.29s/it] + 50%|█████ | 3705/7378 [12:42:43<12:27:35, 12.21s/it] + +{'loss': 0.4529, 'learning_rate': 1.0416943526024632e-05, 'epoch': 0.5} + + 50%|█████ | 3705/7378 [12:42:43<12:27:35, 12.21s/it] + 50%|█████ | 3706/7378 [12:42:55<12:24:52, 12.17s/it] + +{'loss': 0.4439, 'learning_rate': 1.0412557151771865e-05, 'epoch': 0.5} + + 50%|█████ | 3706/7378 [12:42:55<12:24:52, 12.17s/it] + 50%|█████ | 3707/7378 [12:43:08<12:27:24, 12.22s/it] + +{'loss': 0.4713, 'learning_rate': 1.0408170698005172e-05, 'epoch': 0.5} + + 50%|█████ | 3707/7378 [12:43:08<12:27:24, 12.22s/it] + 50%|█████ | 3708/7378 [12:43:20<12:29:27, 12.25s/it] + +{'loss': 0.4883, 'learning_rate': 1.0403784165569972e-05, 'epoch': 0.5} + + 50%|█████ | 3708/7378 [12:43:20<12:29:27, 12.25s/it] + 50%|█████ | 3709/7378 [12:43:32<12:29:00, 12.25s/it] + +{'loss': 0.4497, 'learning_rate': 1.0399397555311701e-05, 'epoch': 0.5} + + 50%|█████ | 3709/7378 [12:43:32<12:29:00, 12.25s/it] + 50%|█████ | 3710/7378 [12:43:45<12:34:09, 12.34s/it] + +{'loss': 0.4591, 'learning_rate': 1.0395010868075814e-05, 'epoch': 0.5} + + 50%|█████ | 3710/7378 [12:43:45<12:34:09, 12.34s/it] + 50%|█████ | 3711/7378 [12:43:58<12:43:55, 12.50s/it] + +{'loss': 0.4349, 'learning_rate': 1.0390624104707771e-05, 'epoch': 0.5} + + 50%|█████ | 3711/7378 [12:43:58<12:43:55, 12.50s/it] + 50%|█████ | 3712/7378 [12:44:10<12:38:19, 12.41s/it] + +{'loss': 0.4307, 'learning_rate': 1.0386237266053054e-05, 'epoch': 0.5} + + 50%|█████ | 3712/7378 [12:44:10<12:38:19, 12.41s/it] + 50%|█████ | 3713/7378 [12:44:22<12:37:06, 12.39s/it] + +{'loss': 0.4204, 'learning_rate': 1.0381850352957157e-05, 'epoch': 0.5} + + 50%|█████ | 3713/7378 [12:44:22<12:37:06, 12.39s/it] + 50%|█████ | 3714/7378 [12:44:35<12:38:33, 12.42s/it] + +{'loss': 0.3886, 'learning_rate': 1.037746336626559e-05, 'epoch': 0.5} + + 50%|█████ | 3714/7378 [12:44:35<12:38:33, 12.42s/it] + 50%|█████ | 3715/7378 [12:44:47<12:36:17, 12.39s/it] + +{'loss': 0.4851, 'learning_rate': 1.0373076306823873e-05, 'epoch': 0.5} + + 50%|█████ | 3715/7378 [12:44:47<12:36:17, 12.39s/it] + 50%|█████ | 3716/7378 [12:44:59<12:30:46, 12.30s/it] + +{'loss': 0.4277, 'learning_rate': 1.0368689175477545e-05, 'epoch': 0.5} + + 50%|█████ | 3716/7378 [12:44:59<12:30:46, 12.30s/it] + 50%|█████ | 3717/7378 [12:45:11<12:30:02, 12.29s/it] + +{'loss': 0.4431, 'learning_rate': 1.0364301973072156e-05, 'epoch': 0.5} + + 50%|█████ | 3717/7378 [12:45:11<12:30:02, 12.29s/it] + 50%|█████ | 3718/7378 [12:45:24<12:27:16, 12.25s/it] + +{'loss': 0.3589, 'learning_rate': 1.0359914700453268e-05, 'epoch': 0.5} + + 50%|█████ | 3718/7378 [12:45:24<12:27:16, 12.25s/it] + 50%|█████ | 3719/7378 [12:45:36<12:29:36, 12.29s/it] + +{'loss': 0.4666, 'learning_rate': 1.035552735846647e-05, 'epoch': 0.5} + + 50%|█████ | 3719/7378 [12:45:36<12:29:36, 12.29s/it] + 50%|█████ | 3720/7378 [12:45:48<12:29:26, 12.29s/it] + +{'loss': 0.4307, 'learning_rate': 1.0351139947957336e-05, 'epoch': 0.5} + + 50%|█████ | 3720/7378 [12:45:48<12:29:26, 12.29s/it] + 50%|█████ | 3721/7378 [12:46:00<12:24:58, 12.22s/it] + +{'loss': 0.5114, 'learning_rate': 1.0346752469771485e-05, 'epoch': 0.5} + + 50%|█████ | 3721/7378 [12:46:00<12:24:58, 12.22s/it] + 50%|█████ | 3722/7378 [12:46:13<12:31:20, 12.33s/it] + +{'loss': 0.4808, 'learning_rate': 1.0342364924754528e-05, 'epoch': 0.5} + + 50%|█████ | 3722/7378 [12:46:13<12:31:20, 12.33s/it] + 50%|█████ | 3723/7378 [12:46:25<12:34:00, 12.38s/it] + +{'loss': 0.4931, 'learning_rate': 1.0337977313752102e-05, 'epoch': 0.5} + + 50%|█████ | 3723/7378 [12:46:25<12:34:00, 12.38s/it] + 50%|█████ | 3724/7378 [12:46:37<12:29:34, 12.31s/it] + +{'loss': 0.5397, 'learning_rate': 1.033358963760984e-05, 'epoch': 0.5} + + 50%|█████ | 3724/7378 [12:46:37<12:29:34, 12.31s/it] + 50%|█████ | 3725/7378 [12:46:50<12:32:34, 12.36s/it] + +{'loss': 0.4436, 'learning_rate': 1.0329201897173402e-05, 'epoch': 0.5} + + 50%|█████ | 3725/7378 [12:46:50<12:32:34, 12.36s/it] + 51%|█████ | 3726/7378 [12:47:02<12:27:27, 12.28s/it] + +{'loss': 0.4889, 'learning_rate': 1.0324814093288463e-05, 'epoch': 0.51} + + 51%|█████ | 3726/7378 [12:47:02<12:27:27, 12.28s/it] + 51%|█████ | 3727/7378 [12:47:14<12:25:49, 12.26s/it] + +{'loss': 0.4848, 'learning_rate': 1.0320426226800693e-05, 'epoch': 0.51} + + 51%|█████ | 3727/7378 [12:47:14<12:25:49, 12.26s/it] + 51%|█████ | 3728/7378 [12:47:27<12:29:29, 12.32s/it] + +{'loss': 0.5, 'learning_rate': 1.0316038298555793e-05, 'epoch': 0.51} + + 51%|█████ | 3728/7378 [12:47:27<12:29:29, 12.32s/it] + 51%|█████ | 3729/7378 [12:47:39<12:28:48, 12.31s/it] + +{'loss': 0.4412, 'learning_rate': 1.0311650309399463e-05, 'epoch': 0.51} + + 51%|█████ | 3729/7378 [12:47:39<12:28:48, 12.31s/it] + 51%|█████ | 3730/7378 [12:47:51<12:31:26, 12.36s/it] + +{'loss': 0.4653, 'learning_rate': 1.030726226017742e-05, 'epoch': 0.51} + + 51%|█████ | 3730/7378 [12:47:52<12:31:26, 12.36s/it] + 51%|█████ | 3731/7378 [12:48:04<12:26:23, 12.28s/it] + +{'loss': 0.5013, 'learning_rate': 1.0302874151735391e-05, 'epoch': 0.51} + + 51%|█████ | 3731/7378 [12:48:04<12:26:23, 12.28s/it] + 51%|█████ | 3732/7378 [12:48:16<12:34:21, 12.41s/it] + +{'loss': 0.5021, 'learning_rate': 1.0298485984919115e-05, 'epoch': 0.51} + + 51%|█████ | 3732/7378 [12:48:16<12:34:21, 12.41s/it] + 51%|█████ | 3733/7378 [12:48:28<12:27:07, 12.30s/it] + +{'loss': 0.5083, 'learning_rate': 1.0294097760574345e-05, 'epoch': 0.51} + + 51%|█████ | 3733/7378 [12:48:28<12:27:07, 12.30s/it] + 51%|█████ | 3734/7378 [12:48:40<12:24:04, 12.25s/it] + +{'loss': 0.4204, 'learning_rate': 1.028970947954684e-05, 'epoch': 0.51} + + 51%|█████ | 3734/7378 [12:48:40<12:24:04, 12.25s/it] + 51%|█████ | 3735/7378 [12:48:53<12:23:15, 12.24s/it] + +{'loss': 0.4937, 'learning_rate': 1.0285321142682372e-05, 'epoch': 0.51} + + 51%|█████ | 3735/7378 [12:48:53<12:23:15, 12.24s/it] + 51%|█████ | 3736/7378 [12:49:05<12:18:56, 12.17s/it] + +{'loss': 0.459, 'learning_rate': 1.028093275082673e-05, 'epoch': 0.51} + + 51%|█████ | 3736/7378 [12:49:05<12:18:56, 12.17s/it] + 51%|█████ | 3737/7378 [12:49:17<12:19:40, 12.19s/it] + +{'loss': 0.4556, 'learning_rate': 1.0276544304825695e-05, 'epoch': 0.51} + + 51%|█████ | 3737/7378 [12:49:17<12:19:40, 12.19s/it] + 51%|█████ | 3738/7378 [12:49:29<12:17:43, 12.16s/it] + +{'loss': 0.4712, 'learning_rate': 1.0272155805525085e-05, 'epoch': 0.51} + + 51%|█████ | 3738/7378 [12:49:29<12:17:43, 12.16s/it] + 51%|█████ | 3739/7378 [12:49:41<12:22:25, 12.24s/it] + +{'loss': 0.4931, 'learning_rate': 1.0267767253770707e-05, 'epoch': 0.51} + + 51%|█████ | 3739/7378 [12:49:41<12:22:25, 12.24s/it] + 51%|█████ | 3740/7378 [12:49:54<12:19:42, 12.20s/it] + +{'loss': 0.453, 'learning_rate': 1.0263378650408386e-05, 'epoch': 0.51} + + 51%|█████ | 3740/7378 [12:49:54<12:19:42, 12.20s/it] + 51%|█████ | 3741/7378 [12:50:06<12:29:35, 12.37s/it] + +{'loss': 0.4062, 'learning_rate': 1.0258989996283959e-05, 'epoch': 0.51} + + 51%|█████ | 3741/7378 [12:50:06<12:29:35, 12.37s/it] + 51%|█████ | 3742/7378 [12:50:19<12:28:56, 12.36s/it] + +{'loss': 0.4773, 'learning_rate': 1.025460129224327e-05, 'epoch': 0.51} + + 51%|█████ | 3742/7378 [12:50:19<12:28:56, 12.36s/it] + 51%|█████ | 3743/7378 [12:50:31<12:32:57, 12.43s/it] + +{'loss': 0.4739, 'learning_rate': 1.025021253913217e-05, 'epoch': 0.51} + + 51%|█████ | 3743/7378 [12:50:31<12:32:57, 12.43s/it] + 51%|█████ | 3744/7378 [12:50:44<12:30:01, 12.38s/it] + +{'loss': 0.5178, 'learning_rate': 1.0245823737796525e-05, 'epoch': 0.51} + + 51%|█████ | 3744/7378 [12:50:44<12:30:01, 12.38s/it] + 51%|█████ | 3745/7378 [12:50:55<12:19:11, 12.21s/it] + +{'loss': 0.4306, 'learning_rate': 1.0241434889082207e-05, 'epoch': 0.51} + + 51%|█████ | 3745/7378 [12:50:55<12:19:11, 12.21s/it] + 51%|█████ | 3746/7378 [12:51:07<12:14:41, 12.14s/it] + +{'loss': 0.3987, 'learning_rate': 1.0237045993835096e-05, 'epoch': 0.51} + + 51%|█████ | 3746/7378 [12:51:07<12:14:41, 12.14s/it] + 51%|█████ | 3747/7378 [12:51:20<12:24:39, 12.31s/it] + +{'loss': 0.5035, 'learning_rate': 1.0232657052901087e-05, 'epoch': 0.51} + + 51%|█████ | 3747/7378 [12:51:20<12:24:39, 12.31s/it] + 51%|█████ | 3748/7378 [12:51:32<12:18:21, 12.20s/it] + +{'loss': 0.4965, 'learning_rate': 1.0228268067126074e-05, 'epoch': 0.51} + + 51%|█████ | 3748/7378 [12:51:32<12:18:21, 12.20s/it] + 51%|█████ | 3749/7378 [12:51:44<12:21:23, 12.26s/it] + +{'loss': 0.5059, 'learning_rate': 1.0223879037355972e-05, 'epoch': 0.51} + + 51%|█████ | 3749/7378 [12:51:44<12:21:23, 12.26s/it] + 51%|█████ | 3750/7378 [12:51:57<12:23:43, 12.30s/it] + +{'loss': 0.4609, 'learning_rate': 1.0219489964436695e-05, 'epoch': 0.51} + + 51%|█████ | 3750/7378 [12:51:57<12:23:43, 12.30s/it] + 51%|█████ | 3751/7378 [12:52:09<12:19:46, 12.24s/it] + +{'loss': 0.4964, 'learning_rate': 1.0215100849214165e-05, 'epoch': 0.51} + + 51%|█████ | 3751/7378 [12:52:09<12:19:46, 12.24s/it] + 51%|█████ | 3752/7378 [12:52:21<12:18:01, 12.21s/it] + +{'loss': 0.4114, 'learning_rate': 1.021071169253432e-05, 'epoch': 0.51} + + 51%|█████ | 3752/7378 [12:52:21<12:18:01, 12.21s/it] + 51%|█████ | 3753/7378 [12:52:35<12:44:28, 12.65s/it] + +{'loss': 0.4661, 'learning_rate': 1.0206322495243093e-05, 'epoch': 0.51} + + 51%|█████ | 3753/7378 [12:52:35<12:44:28, 12.65s/it] + 51%|█████ | 3754/7378 [12:52:47<12:40:23, 12.59s/it] + +{'loss': 0.4746, 'learning_rate': 1.0201933258186442e-05, 'epoch': 0.51} + + 51%|█████ | 3754/7378 [12:52:47<12:40:23, 12.59s/it] + 51%|█████ | 3755/7378 [12:52:59<12:27:59, 12.39s/it] + +{'loss': 0.4769, 'learning_rate': 1.019754398221032e-05, 'epoch': 0.51} + + 51%|█████ | 3755/7378 [12:52:59<12:27:59, 12.39s/it] + 51%|█████ | 3756/7378 [12:53:12<12:31:04, 12.44s/it] + +{'loss': 0.5142, 'learning_rate': 1.0193154668160692e-05, 'epoch': 0.51} + + 51%|█████ | 3756/7378 [12:53:12<12:31:04, 12.44s/it] + 51%|█████ | 3757/7378 [12:53:24<12:35:20, 12.52s/it] + +{'loss': 0.4588, 'learning_rate': 1.0188765316883529e-05, 'epoch': 0.51} + + 51%|█████ | 3757/7378 [12:53:24<12:35:20, 12.52s/it] + 51%|█████ | 3758/7378 [12:53:37<12:30:47, 12.44s/it] + +{'loss': 0.512, 'learning_rate': 1.0184375929224808e-05, 'epoch': 0.51} + + 51%|█████ | 3758/7378 [12:53:37<12:30:47, 12.44s/it] + 51%|█████ | 3759/7378 [12:53:49<12:22:19, 12.31s/it] + +{'loss': 0.4703, 'learning_rate': 1.0179986506030518e-05, 'epoch': 0.51} + + 51%|█████ | 3759/7378 [12:53:49<12:22:19, 12.31s/it] + 51%|█████ | 3760/7378 [12:54:01<12:18:12, 12.24s/it] + +{'loss': 0.3735, 'learning_rate': 1.0175597048146647e-05, 'epoch': 0.51} + + 51%|█████ | 3760/7378 [12:54:01<12:18:12, 12.24s/it] + 51%|█████ | 3761/7378 [12:54:13<12:23:02, 12.33s/it] + +{'loss': 0.4178, 'learning_rate': 1.0171207556419198e-05, 'epoch': 0.51} + + 51%|█████ | 3761/7378 [12:54:13<12:23:02, 12.33s/it] + 51%|█████ | 3762/7378 [12:54:25<12:18:00, 12.25s/it] + +{'loss': 0.4539, 'learning_rate': 1.0166818031694174e-05, 'epoch': 0.51} + + 51%|█████ | 3762/7378 [12:54:25<12:18:00, 12.25s/it] + 51%|█████ | 3763/7378 [12:54:38<12:18:57, 12.26s/it] + +{'loss': 0.4149, 'learning_rate': 1.0162428474817591e-05, 'epoch': 0.51} + + 51%|█████ | 3763/7378 [12:54:38<12:18:57, 12.26s/it] + 51%|█████ | 3764/7378 [12:54:50<12:15:07, 12.20s/it] + +{'loss': 0.4346, 'learning_rate': 1.0158038886635462e-05, 'epoch': 0.51} + + 51%|█████ | 3764/7378 [12:54:50<12:15:07, 12.20s/it] + 51%|█████ | 3765/7378 [12:55:02<12:16:47, 12.24s/it] + +{'loss': 0.4043, 'learning_rate': 1.015364926799382e-05, 'epoch': 0.51} + + 51%|█████ | 3765/7378 [12:55:02<12:16:47, 12.24s/it] + 51%|█████ | 3766/7378 [12:55:15<12:23:38, 12.35s/it] + +{'loss': 0.4407, 'learning_rate': 1.0149259619738685e-05, 'epoch': 0.51} + + 51%|█████ | 3766/7378 [12:55:15<12:23:38, 12.35s/it] + 51%|█████ | 3767/7378 [12:55:28<12:36:52, 12.58s/it] + +{'loss': 0.4676, 'learning_rate': 1.0144869942716098e-05, 'epoch': 0.51} + + 51%|█████ | 3767/7378 [12:55:28<12:36:52, 12.58s/it] + 51%|█████ | 3768/7378 [12:55:40<12:27:58, 12.43s/it] + +{'loss': 0.4639, 'learning_rate': 1.0140480237772098e-05, 'epoch': 0.51} + + 51%|█████ | 3768/7378 [12:55:40<12:27:58, 12.43s/it] + 51%|█████ | 3769/7378 [12:55:52<12:26:52, 12.42s/it] + +{'loss': 0.4636, 'learning_rate': 1.0136090505752736e-05, 'epoch': 0.51} + + 51%|█████ | 3769/7378 [12:55:52<12:26:52, 12.42s/it] + 51%|█████ | 3770/7378 [12:56:04<12:23:43, 12.37s/it] + +{'loss': 0.4693, 'learning_rate': 1.0131700747504064e-05, 'epoch': 0.51} + + 51%|█████ | 3770/7378 [12:56:04<12:23:43, 12.37s/it] + 51%|█████ | 3771/7378 [12:56:16<12:19:14, 12.30s/it] + +{'loss': 0.4576, 'learning_rate': 1.0127310963872134e-05, 'epoch': 0.51} + + 51%|█████ | 3771/7378 [12:56:17<12:19:14, 12.30s/it] + 51%|█████ | 3772/7378 [12:56:29<12:19:25, 12.30s/it] + +{'loss': 0.4879, 'learning_rate': 1.0122921155703011e-05, 'epoch': 0.51} + + 51%|█████ | 3772/7378 [12:56:29<12:19:25, 12.30s/it] + 51%|█████ | 3773/7378 [12:56:41<12:17:55, 12.28s/it] + +{'loss': 0.4178, 'learning_rate': 1.0118531323842764e-05, 'epoch': 0.51} + + 51%|█████ | 3773/7378 [12:56:41<12:17:55, 12.28s/it] + 51%|█████ | 3774/7378 [12:56:53<12:14:05, 12.22s/it] + +{'loss': 0.4164, 'learning_rate': 1.0114141469137459e-05, 'epoch': 0.51} + + 51%|█████ | 3774/7378 [12:56:53<12:14:05, 12.22s/it] + 51%|█████ | 3775/7378 [12:57:06<12:20:06, 12.32s/it] + +{'loss': 0.4914, 'learning_rate': 1.0109751592433177e-05, 'epoch': 0.51} + + 51%|█████ | 3775/7378 [12:57:06<12:20:06, 12.32s/it] + 51%|█████ | 3776/7378 [12:57:18<12:19:17, 12.31s/it] + +{'loss': 0.4382, 'learning_rate': 1.0105361694575992e-05, 'epoch': 0.51} + + 51%|█████ | 3776/7378 [12:57:18<12:19:17, 12.31s/it] + 51%|█████ | 3777/7378 [12:57:30<12:12:12, 12.20s/it] + +{'loss': 0.4823, 'learning_rate': 1.0100971776411996e-05, 'epoch': 0.51} + + 51%|█████ | 3777/7378 [12:57:30<12:12:12, 12.20s/it] + 51%|█████ | 3778/7378 [12:57:43<12:20:35, 12.34s/it] + +{'loss': 0.4337, 'learning_rate': 1.009658183878727e-05, 'epoch': 0.51} + + 51%|█████ | 3778/7378 [12:57:43<12:20:35, 12.34s/it] + 51%|█████ | 3779/7378 [12:57:55<12:23:29, 12.40s/it] + +{'loss': 0.4766, 'learning_rate': 1.009219188254791e-05, 'epoch': 0.51} + + 51%|█████ | 3779/7378 [12:57:55<12:23:29, 12.40s/it] + 51%|█████ | 3780/7378 [12:58:07<12:20:51, 12.35s/it] + +{'loss': 0.4853, 'learning_rate': 1.0087801908540009e-05, 'epoch': 0.51} + + 51%|█████ | 3780/7378 [12:58:07<12:20:51, 12.35s/it] + 51%|█████ | 3781/7378 [12:58:19<12:13:18, 12.23s/it] + +{'loss': 0.4912, 'learning_rate': 1.0083411917609664e-05, 'epoch': 0.51} + + 51%|█████ | 3781/7378 [12:58:19<12:13:18, 12.23s/it] + 51%|█████▏ | 3782/7378 [12:58:32<12:15:12, 12.27s/it] + +{'loss': 0.4345, 'learning_rate': 1.0079021910602982e-05, 'epoch': 0.51} + + 51%|█████▏ | 3782/7378 [12:58:32<12:15:12, 12.27s/it] + 51%|█████▏ | 3783/7378 [12:58:44<12:21:16, 12.37s/it] + +{'loss': 0.4394, 'learning_rate': 1.0074631888366063e-05, 'epoch': 0.51} + + 51%|█████▏ | 3783/7378 [12:58:44<12:21:16, 12.37s/it] + 51%|█████▏ | 3784/7378 [12:58:57<12:20:14, 12.36s/it] + +{'loss': 0.4584, 'learning_rate': 1.0070241851745018e-05, 'epoch': 0.51} + + 51%|█████▏ | 3784/7378 [12:58:57<12:20:14, 12.36s/it] + 51%|█████▏ | 3785/7378 [12:59:09<12:17:24, 12.31s/it] + +{'loss': 0.5447, 'learning_rate': 1.0065851801585956e-05, 'epoch': 0.51} + + 51%|█████▏ | 3785/7378 [12:59:09<12:17:24, 12.31s/it] + 51%|█████▏ | 3786/7378 [12:59:21<12:18:01, 12.33s/it] + +{'loss': 0.4693, 'learning_rate': 1.006146173873499e-05, 'epoch': 0.51} + + 51%|█████▏ | 3786/7378 [12:59:21<12:18:01, 12.33s/it] + 51%|█████▏ | 3787/7378 [12:59:33<12:14:03, 12.27s/it] + +{'loss': 0.4622, 'learning_rate': 1.005707166403824e-05, 'epoch': 0.51} + + 51%|█████▏ | 3787/7378 [12:59:33<12:14:03, 12.27s/it] + 51%|█████▏ | 3788/7378 [12:59:46<12:18:30, 12.34s/it] + +{'loss': 0.4103, 'learning_rate': 1.005268157834182e-05, 'epoch': 0.51} + + 51%|█████▏ | 3788/7378 [12:59:46<12:18:30, 12.34s/it] + 51%|█████▏ | 3789/7378 [12:59:58<12:17:24, 12.33s/it] + +{'loss': 0.509, 'learning_rate': 1.0048291482491853e-05, 'epoch': 0.51} + + 51%|█████▏ | 3789/7378 [12:59:58<12:17:24, 12.33s/it] + 51%|█████▏ | 3790/7378 [13:00:11<12:25:15, 12.46s/it] + +{'loss': 0.5211, 'learning_rate': 1.0043901377334453e-05, 'epoch': 0.51} + + 51%|█████▏ | 3790/7378 [13:00:11<12:25:15, 12.46s/it] + 51%|█████▏ | 3791/7378 [13:00:23<12:22:58, 12.43s/it] + +{'loss': 0.4473, 'learning_rate': 1.0039511263715757e-05, 'epoch': 0.51} + + 51%|█████▏ | 3791/7378 [13:00:23<12:22:58, 12.43s/it] + 51%|█████▏ | 3792/7378 [13:00:35<12:18:10, 12.35s/it] + +{'loss': 0.4003, 'learning_rate': 1.0035121142481883e-05, 'epoch': 0.51} + + 51%|█████▏ | 3792/7378 [13:00:35<12:18:10, 12.35s/it] + 51%|█████▏ | 3793/7378 [13:00:47<12:11:33, 12.24s/it] + +{'loss': 0.4984, 'learning_rate': 1.0030731014478958e-05, 'epoch': 0.51} + + 51%|█████▏ | 3793/7378 [13:00:47<12:11:33, 12.24s/it] + 51%|█████▏ | 3794/7378 [13:01:00<12:10:46, 12.23s/it] + +{'loss': 0.4047, 'learning_rate': 1.0026340880553114e-05, 'epoch': 0.51} + + 51%|█████▏ | 3794/7378 [13:01:00<12:10:46, 12.23s/it] + 51%|█████▏ | 3795/7378 [13:01:12<12:15:12, 12.31s/it] + +{'loss': 0.4103, 'learning_rate': 1.0021950741550474e-05, 'epoch': 0.51} + + 51%|█████▏ | 3795/7378 [13:01:12<12:15:12, 12.31s/it] + 51%|█████▏ | 3796/7378 [13:01:25<12:16:29, 12.34s/it] + +{'loss': 0.4598, 'learning_rate': 1.0017560598317178e-05, 'epoch': 0.51} + + 51%|█████▏ | 3796/7378 [13:01:25<12:16:29, 12.34s/it] + 51%|█████▏ | 3797/7378 [13:01:37<12:23:30, 12.46s/it] + +{'loss': 0.4722, 'learning_rate': 1.0013170451699347e-05, 'epoch': 0.51} + + 51%|█████▏ | 3797/7378 [13:01:37<12:23:30, 12.46s/it] + 51%|█████▏ | 3798/7378 [13:01:50<12:26:47, 12.52s/it] + +{'loss': 0.493, 'learning_rate': 1.000878030254312e-05, 'epoch': 0.51} + + 51%|█████▏ | 3798/7378 [13:01:50<12:26:47, 12.52s/it] + 51%|█████▏ | 3799/7378 [13:02:02<12:24:46, 12.49s/it] + +{'loss': 0.4558, 'learning_rate': 1.0004390151694627e-05, 'epoch': 0.51} + + 51%|█████▏ | 3799/7378 [13:02:02<12:24:46, 12.49s/it] + 52%|█████▏ | 3800/7378 [13:02:15<12:24:31, 12.48s/it] + +{'loss': 0.4262, 'learning_rate': 1e-05, 'epoch': 0.52} + + 52%|█████▏ | 3800/7378 [13:02:15<12:24:31, 12.48s/it] + 52%|█████▏ | 3801/7378 [13:02:27<12:21:37, 12.44s/it] + +{'loss': 0.4494, 'learning_rate': 9.995609848305376e-06, 'epoch': 0.52} + + 52%|█████▏ | 3801/7378 [13:02:27<12:21:37, 12.44s/it] + 52%|█████▏ | 3802/7378 [13:02:40<12:25:22, 12.51s/it] + +{'loss': 0.4377, 'learning_rate': 9.991219697456882e-06, 'epoch': 0.52} + + 52%|█████▏ | 3802/7378 [13:02:40<12:25:22, 12.51s/it] + 52%|█████▏ | 3803/7378 [13:02:52<12:21:36, 12.45s/it] + +{'loss': 0.4257, 'learning_rate': 9.986829548300656e-06, 'epoch': 0.52} + + 52%|█████▏ | 3803/7378 [13:02:52<12:21:36, 12.45s/it] + 52%|█████▏ | 3804/7378 [13:03:04<12:11:46, 12.28s/it] + +{'loss': 0.4516, 'learning_rate': 9.982439401682827e-06, 'epoch': 0.52} + + 52%|█████▏ | 3804/7378 [13:03:04<12:11:46, 12.28s/it] + 52%|█████▏ | 3805/7378 [13:03:16<12:11:15, 12.28s/it] + +{'loss': 0.4855, 'learning_rate': 9.978049258449528e-06, 'epoch': 0.52} + + 52%|█████▏ | 3805/7378 [13:03:16<12:11:15, 12.28s/it] + 52%|█████▏ | 3806/7378 [13:03:29<12:11:25, 12.29s/it] + +{'loss': 0.473, 'learning_rate': 9.973659119446889e-06, 'epoch': 0.52} + + 52%|█████▏ | 3806/7378 [13:03:29<12:11:25, 12.29s/it] + 52%|█████▏ | 3807/7378 [13:03:41<12:16:12, 12.37s/it] + +{'loss': 0.4094, 'learning_rate': 9.969268985521044e-06, 'epoch': 0.52} + + 52%|█████▏ | 3807/7378 [13:03:41<12:16:12, 12.37s/it] + 52%|█████▏ | 3808/7378 [13:03:53<12:10:43, 12.28s/it] + +{'loss': 0.419, 'learning_rate': 9.964878857518119e-06, 'epoch': 0.52} + + 52%|█████▏ | 3808/7378 [13:03:53<12:10:43, 12.28s/it] + 52%|█████▏ | 3809/7378 [13:04:05<12:06:52, 12.22s/it] + +{'loss': 0.4639, 'learning_rate': 9.960488736284246e-06, 'epoch': 0.52} + + 52%|█████▏ | 3809/7378 [13:04:05<12:06:52, 12.22s/it] + 52%|█████▏ | 3810/7378 [13:04:17<12:01:16, 12.13s/it] + +{'loss': 0.4588, 'learning_rate': 9.956098622665548e-06, 'epoch': 0.52} + + 52%|█████▏ | 3810/7378 [13:04:17<12:01:16, 12.13s/it] + 52%|█████▏ | 3811/7378 [13:04:30<12:08:58, 12.26s/it] + +{'loss': 0.5535, 'learning_rate': 9.951708517508152e-06, 'epoch': 0.52} + + 52%|█████▏ | 3811/7378 [13:04:30<12:08:58, 12.26s/it] + 52%|█████▏ | 3812/7378 [13:04:42<12:10:00, 12.28s/it] + +{'loss': 0.4903, 'learning_rate': 9.947318421658185e-06, 'epoch': 0.52} + + 52%|█████▏ | 3812/7378 [13:04:42<12:10:00, 12.28s/it] + 52%|█████▏ | 3813/7378 [13:04:54<12:07:31, 12.24s/it] + +{'loss': 0.5006, 'learning_rate': 9.942928335961765e-06, 'epoch': 0.52} + + 52%|█████▏ | 3813/7378 [13:04:54<12:07:31, 12.24s/it] + 52%|█████▏ | 3814/7378 [13:05:07<12:12:59, 12.34s/it] + +{'loss': 0.4441, 'learning_rate': 9.938538261265014e-06, 'epoch': 0.52} + + 52%|█████▏ | 3814/7378 [13:05:07<12:12:59, 12.34s/it] + 52%|█████▏ | 3815/7378 [13:05:19<12:11:48, 12.32s/it] + +{'loss': 0.4854, 'learning_rate': 9.93414819841405e-06, 'epoch': 0.52} + + 52%|█████▏ | 3815/7378 [13:05:19<12:11:48, 12.32s/it] + 52%|█████▏ | 3816/7378 [13:05:31<12:08:42, 12.27s/it] + +{'loss': 0.4304, 'learning_rate': 9.929758148254987e-06, 'epoch': 0.52} + + 52%|█████▏ | 3816/7378 [13:05:31<12:08:42, 12.27s/it] + 52%|█████▏ | 3817/7378 [13:05:44<12:08:49, 12.28s/it] + +{'loss': 0.4459, 'learning_rate': 9.925368111633944e-06, 'epoch': 0.52} + + 52%|█████▏ | 3817/7378 [13:05:44<12:08:49, 12.28s/it] + 52%|█████▏ | 3818/7378 [13:05:56<12:09:10, 12.29s/it] + +{'loss': 0.4611, 'learning_rate': 9.920978089397025e-06, 'epoch': 0.52} + + 52%|█████▏ | 3818/7378 [13:05:56<12:09:10, 12.29s/it] + 52%|█████▏ | 3819/7378 [13:06:08<12:07:29, 12.26s/it] + +{'loss': 0.4558, 'learning_rate': 9.916588082390342e-06, 'epoch': 0.52} + + 52%|█████▏ | 3819/7378 [13:06:08<12:07:29, 12.26s/it] + 52%|█████▏ | 3820/7378 [13:06:21<12:12:33, 12.35s/it] + +{'loss': 0.4058, 'learning_rate': 9.912198091459996e-06, 'epoch': 0.52} + + 52%|█████▏ | 3820/7378 [13:06:21<12:12:33, 12.35s/it] + 52%|█████▏ | 3821/7378 [13:06:33<12:13:13, 12.37s/it] + +{'loss': 0.4754, 'learning_rate': 9.907808117452096e-06, 'epoch': 0.52} + + 52%|█████▏ | 3821/7378 [13:06:33<12:13:13, 12.37s/it] + 52%|█████▏ | 3822/7378 [13:06:46<12:19:18, 12.47s/it] + +{'loss': 0.4547, 'learning_rate': 9.903418161212732e-06, 'epoch': 0.52} + + 52%|█████▏ | 3822/7378 [13:06:46<12:19:18, 12.47s/it] + 52%|█████▏ | 3823/7378 [13:06:58<12:20:08, 12.49s/it] + +{'loss': 0.5042, 'learning_rate': 9.899028223588003e-06, 'epoch': 0.52} + + 52%|█████▏ | 3823/7378 [13:06:58<12:20:08, 12.49s/it] + 52%|█████▏ | 3824/7378 [13:07:11<12:15:17, 12.41s/it] + +{'loss': 0.4155, 'learning_rate': 9.894638305424007e-06, 'epoch': 0.52} + + 52%|█████▏ | 3824/7378 [13:07:11<12:15:17, 12.41s/it] + 52%|█████▏ | 3825/7378 [13:07:23<12:14:10, 12.40s/it] + +{'loss': 0.3841, 'learning_rate': 9.890248407566823e-06, 'epoch': 0.52} + + 52%|█████▏ | 3825/7378 [13:07:23<12:14:10, 12.40s/it] + 52%|██���██▏ | 3826/7378 [13:07:35<12:13:42, 12.39s/it] + +{'loss': 0.476, 'learning_rate': 9.885858530862543e-06, 'epoch': 0.52} + + 52%|█████▏ | 3826/7378 [13:07:35<12:13:42, 12.39s/it] + 52%|█████▏ | 3827/7378 [13:07:48<12:10:33, 12.34s/it] + +{'loss': 0.453, 'learning_rate': 9.88146867615724e-06, 'epoch': 0.52} + + 52%|█████▏ | 3827/7378 [13:07:48<12:10:33, 12.34s/it] + 52%|█████▏ | 3828/7378 [13:07:59<12:02:16, 12.21s/it] + +{'loss': 0.534, 'learning_rate': 9.877078844296989e-06, 'epoch': 0.52} + + 52%|█████▏ | 3828/7378 [13:07:59<12:02:16, 12.21s/it] + 52%|█████▏ | 3829/7378 [13:08:12<12:02:58, 12.22s/it] + +{'loss': 0.4577, 'learning_rate': 9.872689036127869e-06, 'epoch': 0.52} + + 52%|█████▏ | 3829/7378 [13:08:12<12:02:58, 12.22s/it] + 52%|█████▏ | 3830/7378 [13:08:24<12:07:18, 12.30s/it] + +{'loss': 0.4394, 'learning_rate': 9.868299252495938e-06, 'epoch': 0.52} + + 52%|█████▏ | 3830/7378 [13:08:24<12:07:18, 12.30s/it] + 52%|█████▏ | 3831/7378 [13:08:37<12:09:17, 12.34s/it] + +{'loss': 0.4637, 'learning_rate': 9.863909494247264e-06, 'epoch': 0.52} + + 52%|█████▏ | 3831/7378 [13:08:37<12:09:17, 12.34s/it] + 52%|█████▏ | 3832/7378 [13:08:49<12:07:50, 12.32s/it] + +{'loss': 0.43, 'learning_rate': 9.859519762227902e-06, 'epoch': 0.52} + + 52%|█████▏ | 3832/7378 [13:08:49<12:07:50, 12.32s/it] + 52%|█████▏ | 3833/7378 [13:09:01<12:03:04, 12.24s/it] + +{'loss': 0.5136, 'learning_rate': 9.855130057283905e-06, 'epoch': 0.52} + + 52%|█████▏ | 3833/7378 [13:09:01<12:03:04, 12.24s/it] + 52%|█████▏ | 3834/7378 [13:09:13<12:02:29, 12.23s/it] + +{'loss': 0.4551, 'learning_rate': 9.850740380261318e-06, 'epoch': 0.52} + + 52%|█████▏ | 3834/7378 [13:09:13<12:02:29, 12.23s/it] + 52%|█████▏ | 3835/7378 [13:09:25<12:04:00, 12.26s/it] + +{'loss': 0.4508, 'learning_rate': 9.846350732006184e-06, 'epoch': 0.52} + + 52%|█████▏ | 3835/7378 [13:09:25<12:04:00, 12.26s/it] + 52%|█████▏ | 3836/7378 [13:09:38<12:09:40, 12.36s/it] + +{'loss': 0.4608, 'learning_rate': 9.84196111336454e-06, 'epoch': 0.52} + + 52%|█████▏ | 3836/7378 [13:09:38<12:09:40, 12.36s/it] + 52%|█████▏ | 3837/7378 [13:09:50<12:04:56, 12.28s/it] + +{'loss': 0.3886, 'learning_rate': 9.837571525182412e-06, 'epoch': 0.52} + + 52%|█████▏ | 3837/7378 [13:09:50<12:04:56, 12.28s/it] + 52%|█████▏ | 3838/7378 [13:10:02<12:01:35, 12.23s/it] + +{'loss': 0.4654, 'learning_rate': 9.83318196830583e-06, 'epoch': 0.52} + + 52%|█████▏ | 3838/7378 [13:10:02<12:01:35, 12.23s/it] + 52%|█████▏ | 3839/7378 [13:10:15<12:09:13, 12.36s/it] + +{'loss': 0.3958, 'learning_rate': 9.828792443580805e-06, 'epoch': 0.52} + + 52%|█████▏ | 3839/7378 [13:10:15<12:09:13, 12.36s/it] + 52%|█████▏ | 3840/7378 [13:10:27<12:09:55, 12.38s/it] + +{'loss': 0.4108, 'learning_rate': 9.824402951853358e-06, 'epoch': 0.52} + + 52%|█████▏ | 3840/7378 [13:10:27<12:09:55, 12.38s/it] + 52%|█████▏ | 3841/7378 [13:10:39<12:02:29, 12.26s/it] + +{'loss': 0.5073, 'learning_rate': 9.820013493969487e-06, 'epoch': 0.52} + + 52%|█████▏ | 3841/7378 [13:10:39<12:02:29, 12.26s/it] + 52%|█████▏ | 3842/7378 [13:10:52<12:05:52, 12.32s/it] + +{'loss': 0.4818, 'learning_rate': 9.815624070775195e-06, 'epoch': 0.52} + + 52%|█████▏ | 3842/7378 [13:10:52<12:05:52, 12.32s/it] + 52%|█████▏ | 3843/7378 [13:11:04<12:05:42, 12.32s/it] + +{'loss': 0.4544, 'learning_rate': 9.811234683116475e-06, 'epoch': 0.52} + + 52%|█████▏ | 3843/7378 [13:11:04<12:05:42, 12.32s/it] + 52%|█████▏ | 3844/7378 [13:11:16<12:00:28, 12.23s/it] + +{'loss': 0.4309, 'learning_rate': 9.806845331839311e-06, 'epoch': 0.52} + + 52%|█████▏ | 3844/7378 [13:11:16<12:00:28, 12.23s/it] + 52%|█████▏ | 3845/7378 [13:11:28<12:01:01, 12.24s/it] + +{'loss': 0.4314, 'learning_rate': 9.802456017789683e-06, 'epoch': 0.52} + + 52%|█████▏ | 3845/7378 [13:11:28<12:01:01, 12.24s/it] + 52%|█████▏ | 3846/7378 [13:11:41<12:07:07, 12.35s/it] + +{'loss': 0.4623, 'learning_rate': 9.79806674181356e-06, 'epoch': 0.52} + + 52%|█████▏ | 3846/7378 [13:11:41<12:07:07, 12.35s/it] + 52%|█████▏ | 3847/7378 [13:11:53<11:58:02, 12.20s/it] + +{'loss': 0.457, 'learning_rate': 9.793677504756909e-06, 'epoch': 0.52} + + 52%|█████▏ | 3847/7378 [13:11:53<11:58:02, 12.20s/it] + 52%|█████▏ | 3848/7378 [13:12:06<12:08:03, 12.37s/it] + +{'loss': 0.4681, 'learning_rate': 9.789288307465684e-06, 'epoch': 0.52} + + 52%|█████▏ | 3848/7378 [13:12:06<12:08:03, 12.37s/it] + 52%|█████▏ | 3849/7378 [13:12:17<11:57:38, 12.20s/it] + +{'loss': 0.4553, 'learning_rate': 9.784899150785838e-06, 'epoch': 0.52} + + 52%|█████▏ | 3849/7378 [13:12:17<11:57:38, 12.20s/it] + 52%|█████▏ | 3850/7378 [13:12:30<12:10:07, 12.42s/it] + +{'loss': 0.5242, 'learning_rate': 9.780510035563306e-06, 'epoch': 0.52} + + 52%|█████▏ | 3850/7378 [13:12:30<12:10:07, 12.42s/it] + 52%|█████▏ | 3851/7378 [13:12:43<12:09:15, 12.41s/it] + +{'loss': 0.4494, 'learning_rate': 9.77612096264403e-06, 'epoch': 0.52} + + 52%|█████▏ | 3851/7378 [13:12:43<12:09:15, 12.41s/it] + 52%|█████▏ | 3852/7378 [13:12:55<12:04:08, 12.32s/it] + +{'loss': 0.4489, 'learning_rate': 9.771731932873927e-06, 'epoch': 0.52} + + 52%|█████▏ | 3852/7378 [13:12:55<12:04:08, 12.32s/it] + 52%|█████▏ | 3853/7378 [13:13:08<12:10:25, 12.43s/it] + +{'loss': 0.45, 'learning_rate': 9.767342947098916e-06, 'epoch': 0.52} + + 52%|█████▏ | 3853/7378 [13:13:08<12:10:25, 12.43s/it] + 52%|█████▏ | 3854/7378 [13:13:20<12:03:34, 12.32s/it] + +{'loss': 0.4567, 'learning_rate': 9.762954006164908e-06, 'epoch': 0.52} + + 52%|█████▏ | 3854/7378 [13:13:20<12:03:34, 12.32s/it] + 52%|█████▏ | 3855/7378 [13:13:32<12:02:57, 12.31s/it] + +{'loss': 0.488, 'learning_rate': 9.758565110917797e-06, 'epoch': 0.52} + + 52%|█████▏ | 3855/7378 [13:13:32<12:02:57, 12.31s/it] + 52%|█████▏ | 3856/7378 [13:13:44<12:04:40, 12.35s/it] + +{'loss': 0.4989, 'learning_rate': 9.75417626220348e-06, 'epoch': 0.52} + + 52%|█████▏ | 3856/7378 [13:13:44<12:04:40, 12.35s/it] + 52%|█████▏ | 3857/7378 [13:13:57<12:11:18, 12.46s/it] + +{'loss': 0.5354, 'learning_rate': 9.749787460867835e-06, 'epoch': 0.52} + + 52%|█████▏ | 3857/7378 [13:13:57<12:11:18, 12.46s/it] + 52%|█████▏ | 3858/7378 [13:14:10<12:13:58, 12.51s/it] + +{'loss': 0.4414, 'learning_rate': 9.745398707756735e-06, 'epoch': 0.52} + + 52%|█████▏ | 3858/7378 [13:14:10<12:13:58, 12.51s/it] + 52%|█████▏ | 3859/7378 [13:14:22<12:09:15, 12.43s/it] + +{'loss': 0.4802, 'learning_rate': 9.741010003716045e-06, 'epoch': 0.52} + + 52%|█████▏ | 3859/7378 [13:14:22<12:09:15, 12.43s/it] + 52%|█████▏ | 3860/7378 [13:14:34<12:08:33, 12.43s/it] + +{'loss': 0.3936, 'learning_rate': 9.736621349591619e-06, 'epoch': 0.52} + + 52%|█████▏ | 3860/7378 [13:14:34<12:08:33, 12.43s/it] + 52%|█████▏ | 3861/7378 [13:14:46<12:00:29, 12.29s/it] + +{'loss': 0.4072, 'learning_rate': 9.7322327462293e-06, 'epoch': 0.52} + + 52%|█████▏ | 3861/7378 [13:14:46<12:00:29, 12.29s/it] + 52%|█████▏ | 3862/7378 [13:14:59<12:01:17, 12.31s/it] + +{'loss': 0.4655, 'learning_rate': 9.72784419447492e-06, 'epoch': 0.52} + + 52%|█████▏ | 3862/7378 [13:14:59<12:01:17, 12.31s/it] + 52%|█████▏ | 3863/7378 [13:15:11<12:05:44, 12.39s/it] + +{'loss': 0.4164, 'learning_rate': 9.72345569517431e-06, 'epoch': 0.52} + + 52%|█████▏ | 3863/7378 [13:15:11<12:05:44, 12.39s/it] + 52%|█████▏ | 3864/7378 [13:15:24<12:04:15, 12.37s/it] + +{'loss': 0.3842, 'learning_rate': 9.719067249173277e-06, 'epoch': 0.52} + + 52%|█████▏ | 3864/7378 [13:15:24<12:04:15, 12.37s/it] + 52%|█████▏ | 3865/7378 [13:15:36<12:00:53, 12.31s/it] + +{'loss': 0.4917, 'learning_rate': 9.714678857317632e-06, 'epoch': 0.52} + + 52%|█████▏ | 3865/7378 [13:15:36<12:00:53, 12.31s/it] + 52%|█████▏ | 3866/7378 [13:15:48<11:56:27, 12.24s/it] + +{'loss': 0.4333, 'learning_rate': 9.710290520453162e-06, 'epoch': 0.52} + + 52%|█████▏ | 3866/7378 [13:15:48<11:56:27, 12.24s/it] + 52%|█████▏ | 3867/7378 [13:16:00<11:51:02, 12.15s/it] + +{'loss': 0.3983, 'learning_rate': 9.705902239425655e-06, 'epoch': 0.52} + + 52%|█████▏ | 3867/7378 [13:16:00<11:51:02, 12.15s/it] + 52%|█████▏ | 3868/7378 [13:16:12<11:47:57, 12.10s/it] + +{'loss': 0.4708, 'learning_rate': 9.701514015080886e-06, 'epoch': 0.52} + + 52%|█████▏ | 3868/7378 [13:16:12<11:47:57, 12.10s/it] + 52%|█████▏ | 3869/7378 [13:16:24<11:53:33, 12.20s/it] + +{'loss': 0.4281, 'learning_rate': 9.697125848264608e-06, 'epoch': 0.52} + + 52%|█████▏ | 3869/7378 [13:16:24<11:53:33, 12.20s/it] + 52%|█████▏ | 3870/7378 [13:16:36<11:52:23, 12.18s/it] + +{'loss': 0.4116, 'learning_rate': 9.692737739822582e-06, 'epoch': 0.52} + + 52%|█████▏ | 3870/7378 [13:16:36<11:52:23, 12.18s/it] + 52%|█████▏ | 3871/7378 [13:16:48<11:49:37, 12.14s/it] + +{'loss': 0.4558, 'learning_rate': 9.688349690600538e-06, 'epoch': 0.52} + + 52%|█████▏ | 3871/7378 [13:16:48<11:49:37, 12.14s/it] + 52%|█████▏ | 3872/7378 [13:17:00<11:48:54, 12.13s/it] + +{'loss': 0.5392, 'learning_rate': 9.683961701444208e-06, 'epoch': 0.52} + + 52%|█████▏ | 3872/7378 [13:17:00<11:48:54, 12.13s/it] + 52%|█████▏ | 3873/7378 [13:17:13<11:53:37, 12.22s/it] + +{'loss': 0.506, 'learning_rate': 9.679573773199309e-06, 'epoch': 0.52} + + 52%|█████▏ | 3873/7378 [13:17:13<11:53:37, 12.22s/it] + 53%|█████▎ | 3874/7378 [13:17:26<12:04:14, 12.40s/it] + +{'loss': 0.4707, 'learning_rate': 9.675185906711539e-06, 'epoch': 0.53} + + 53%|█████▎ | 3874/7378 [13:17:26<12:04:14, 12.40s/it] + 53%|█████▎ | 3875/7378 [13:17:38<11:57:07, 12.28s/it] + +{'loss': 0.4243, 'learning_rate': 9.670798102826598e-06, 'epoch': 0.53} + + 53%|█████▎ | 3875/7378 [13:17:38<11:57:07, 12.28s/it] + 53%|█████▎ | 3876/7378 [13:17:50<11:55:49, 12.26s/it] + +{'loss': 0.435, 'learning_rate': 9.666410362390162e-06, 'epoch': 0.53} + + 53%|█████▎ | 3876/7378 [13:17:50<11:55:49, 12.26s/it] + 53%|█████▎ | 3877/7378 [13:18:02<11:57:38, 12.30s/it] + +{'loss': 0.4888, 'learning_rate': 9.662022686247903e-06, 'epoch': 0.53} + + 53%|█████▎ | 3877/7378 [13:18:02<11:57:38, 12.30s/it] + 53%|█████▎ | 3878/7378 [13:18:15<11:57:54, 12.31s/it] + +{'loss': 0.4394, 'learning_rate': 9.657635075245473e-06, 'epoch': 0.53} + + 53%|█████▎ | 3878/7378 [13:18:15<11:57:54, 12.31s/it] + 53%|█████▎ | 3879/7378 [13:18:27<12:05:18, 12.44s/it] + +{'loss': 0.4738, 'learning_rate': 9.653247530228516e-06, 'epoch': 0.53} + + 53%|█████▎ | 3879/7378 [13:18:27<12:05:18, 12.44s/it] + 53%|█████▎ | 3880/7378 [13:18:40<12:06:13, 12.46s/it] + +{'loss': 0.4906, 'learning_rate': 9.648860052042665e-06, 'epoch': 0.53} + + 53%|█████▎ | 3880/7378 [13:18:40<12:06:13, 12.46s/it] + 53%|█████▎ | 3881/7378 [13:18:52<12:04:15, 12.43s/it] + +{'loss': 0.3896, 'learning_rate': 9.644472641533536e-06, 'epoch': 0.53} + + 53%|█████▎ | 3881/7378 [13:18:52<12:04:15, 12.43s/it] + 53%|█████▎ | 3882/7378 [13:19:04<12:00:00, 12.36s/it] + +{'loss': 0.4561, 'learning_rate': 9.640085299546734e-06, 'epoch': 0.53} + + 53%|█████▎ | 3882/7378 [13:19:04<12:00:00, 12.36s/it] + 53%|█████▎ | 3883/7378 [13:19:17<11:59:21, 12.35s/it] + +{'loss': 0.4124, 'learning_rate': 9.635698026927846e-06, 'epoch': 0.53} + + 53%|█████▎ | 3883/7378 [13:19:17<11:59:21, 12.35s/it] + 53%|█████▎ | 3884/7378 [13:19:29<12:02:51, 12.41s/it] + +{'loss': 0.4177, 'learning_rate': 9.63131082452246e-06, 'epoch': 0.53} + + 53%|█████▎ | 3884/7378 [13:19:29<12:02:51, 12.41s/it] + 53%|█████▎ | 3885/7378 [13:19:42<12:01:11, 12.39s/it] + +{'loss': 0.4138, 'learning_rate': 9.62692369317613e-06, 'epoch': 0.53} + + 53%|█████▎ | 3885/7378 [13:19:42<12:01:11, 12.39s/it] + 53%|█████▎ | 3886/7378 [13:19:54<11:51:54, 12.23s/it] + +{'loss': 0.4499, 'learning_rate': 9.622536633734413e-06, 'epoch': 0.53} + + 53%|█████▎ | 3886/7378 [13:19:54<11:51:54, 12.23s/it] + 53%|█████▎ | 3887/7378 [13:20:06<11:52:49, 12.25s/it] + +{'loss': 0.4144, 'learning_rate': 9.618149647042847e-06, 'epoch': 0.53} + + 53%|█████▎ | 3887/7378 [13:20:06<11:52:49, 12.25s/it] + 53%|█████▎ | 3888/7378 [13:20:18<11:43:29, 12.09s/it] + +{'loss': 0.4711, 'learning_rate': 9.61376273394695e-06, 'epoch': 0.53} + + 53%|█████▎ | 3888/7378 [13:20:18<11:43:29, 12.09s/it] + 53%|█████▎ | 3889/7378 [13:20:29<11:41:04, 12.06s/it] + +{'loss': 0.4153, 'learning_rate': 9.609375895292232e-06, 'epoch': 0.53} + + 53%|█████▎ | 3889/7378 [13:20:30<11:41:04, 12.06s/it] + 53%|█████▎ | 3890/7378 [13:20:42<11:47:18, 12.17s/it] + +{'loss': 0.4475, 'learning_rate': 9.60498913192419e-06, 'epoch': 0.53} + + 53%|█████▎ | 3890/7378 [13:20:42<11:47:18, 12.17s/it] + 53%|█████▎ | 3891/7378 [13:20:54<11:48:01, 12.18s/it] + +{'loss': 0.4326, 'learning_rate': 9.6006024446883e-06, 'epoch': 0.53} + + 53%|█████▎ | 3891/7378 [13:20:54<11:48:01, 12.18s/it] + 53%|█████▎ | 3892/7378 [13:21:07<11:57:14, 12.34s/it] + +{'loss': 0.4579, 'learning_rate': 9.596215834430031e-06, 'epoch': 0.53} + + 53%|█████▎ | 3892/7378 [13:21:07<11:57:14, 12.34s/it] + 53%|█████▎ | 3893/7378 [13:21:19<11:53:26, 12.28s/it] + +{'loss': 0.4384, 'learning_rate': 9.591829301994833e-06, 'epoch': 0.53} + + 53%|█████▎ | 3893/7378 [13:21:19<11:53:26, 12.28s/it] + 53%|█████▎ | 3894/7378 [13:21:31<11:49:05, 12.21s/it] + +{'loss': 0.4757, 'learning_rate': 9.587442848228138e-06, 'epoch': 0.53} + + 53%|███���█▎ | 3894/7378 [13:21:31<11:49:05, 12.21s/it] + 53%|█████▎ | 3895/7378 [13:21:43<11:48:02, 12.20s/it] + +{'loss': 0.4789, 'learning_rate': 9.583056473975371e-06, 'epoch': 0.53} + + 53%|█████▎ | 3895/7378 [13:21:43<11:48:02, 12.20s/it] + 53%|█████▎ | 3896/7378 [13:21:56<11:51:27, 12.26s/it] + +{'loss': 0.4382, 'learning_rate': 9.578670180081935e-06, 'epoch': 0.53} + + 53%|█████▎ | 3896/7378 [13:21:56<11:51:27, 12.26s/it] + 53%|█████▎ | 3897/7378 [13:22:08<11:47:24, 12.19s/it] + +{'loss': 0.3819, 'learning_rate': 9.574283967393215e-06, 'epoch': 0.53} + + 53%|█████▎ | 3897/7378 [13:22:08<11:47:24, 12.19s/it] + 53%|█████▎ | 3898/7378 [13:22:20<11:57:46, 12.38s/it] + +{'loss': 0.5149, 'learning_rate': 9.569897836754592e-06, 'epoch': 0.53} + + 53%|█████▎ | 3898/7378 [13:22:20<11:57:46, 12.38s/it] + 53%|█████▎ | 3899/7378 [13:22:33<11:59:18, 12.41s/it] + +{'loss': 0.386, 'learning_rate': 9.565511789011418e-06, 'epoch': 0.53} + + 53%|█████▎ | 3899/7378 [13:22:33<11:59:18, 12.41s/it] + 53%|█████▎ | 3900/7378 [13:22:45<11:57:43, 12.38s/it] + +{'loss': 0.4726, 'learning_rate': 9.56112582500904e-06, 'epoch': 0.53} + + 53%|█████▎ | 3900/7378 [13:22:45<11:57:43, 12.38s/it] + 53%|█████▎ | 3901/7378 [13:22:58<11:55:50, 12.35s/it] + +{'loss': 0.4649, 'learning_rate': 9.556739945592779e-06, 'epoch': 0.53} + + 53%|█████▎ | 3901/7378 [13:22:58<11:55:50, 12.35s/it] + 53%|█████▎ | 3902/7378 [13:23:10<11:56:34, 12.37s/it] + +{'loss': 0.4515, 'learning_rate': 9.552354151607948e-06, 'epoch': 0.53} + + 53%|█████▎ | 3902/7378 [13:23:10<11:56:34, 12.37s/it] + 53%|█████▎ | 3903/7378 [13:23:22<11:52:07, 12.30s/it] + +{'loss': 0.5067, 'learning_rate': 9.54796844389984e-06, 'epoch': 0.53} + + 53%|█████▎ | 3903/7378 [13:23:22<11:52:07, 12.30s/it] + 53%|█████▎ | 3904/7378 [13:23:34<11:50:39, 12.27s/it] + +{'loss': 0.5002, 'learning_rate': 9.54358282331373e-06, 'epoch': 0.53} + + 53%|█████▎ | 3904/7378 [13:23:34<11:50:39, 12.27s/it] + 53%|█████▎ | 3905/7378 [13:23:46<11:46:53, 12.21s/it] + +{'loss': 0.4766, 'learning_rate': 9.539197290694877e-06, 'epoch': 0.53} + + 53%|█████▎ | 3905/7378 [13:23:46<11:46:53, 12.21s/it] + 53%|█████▎ | 3906/7378 [13:23:59<11:49:32, 12.26s/it] + +{'loss': 0.4305, 'learning_rate': 9.534811846888524e-06, 'epoch': 0.53} + + 53%|█████▎ | 3906/7378 [13:23:59<11:49:32, 12.26s/it] + 53%|█████▎ | 3907/7378 [13:24:11<11:50:11, 12.28s/it] + +{'loss': 0.468, 'learning_rate': 9.5304264927399e-06, 'epoch': 0.53} + + 53%|█████▎ | 3907/7378 [13:24:11<11:50:11, 12.28s/it] + 53%|█████▎ | 3908/7378 [13:24:23<11:48:03, 12.24s/it] + +{'loss': 0.4616, 'learning_rate': 9.526041229094206e-06, 'epoch': 0.53} + + 53%|█████▎ | 3908/7378 [13:24:23<11:48:03, 12.24s/it] + 53%|█████▎ | 3909/7378 [13:24:36<11:50:12, 12.28s/it] + +{'loss': 0.4878, 'learning_rate': 9.521656056796643e-06, 'epoch': 0.53} + + 53%|█████▎ | 3909/7378 [13:24:36<11:50:12, 12.28s/it] + 53%|█████▎ | 3910/7378 [13:24:48<11:46:16, 12.22s/it] + +{'loss': 0.3986, 'learning_rate': 9.51727097669237e-06, 'epoch': 0.53} + + 53%|█████▎ | 3910/7378 [13:24:48<11:46:16, 12.22s/it] + 53%|█████▎ | 3911/7378 [13:25:00<11:43:42, 12.18s/it] + +{'loss': 0.5331, 'learning_rate': 9.512885989626555e-06, 'epoch': 0.53} + + 53%|█████▎ | 3911/7378 [13:25:00<11:43:42, 12.18s/it] + 53%|█████▎ | 3912/7378 [13:25:12<11:48:59, 12.27s/it] + +{'loss': 0.5406, 'learning_rate': 9.508501096444335e-06, 'epoch': 0.53} + + 53%|█████▎ | 3912/7378 [13:25:12<11:48:59, 12.27s/it] + 53%|█████▎ | 3913/7378 [13:25:24<11:44:41, 12.20s/it] + +{'loss': 0.4049, 'learning_rate': 9.504116297990826e-06, 'epoch': 0.53} + + 53%|█████▎ | 3913/7378 [13:25:24<11:44:41, 12.20s/it] + 53%|█████▎ | 3914/7378 [13:25:37<11:57:41, 12.43s/it] + +{'loss': 0.4919, 'learning_rate': 9.499731595111125e-06, 'epoch': 0.53} + + 53%|█████▎ | 3914/7378 [13:25:37<11:57:41, 12.43s/it] + 53%|█████▎ | 3915/7378 [13:25:50<11:56:13, 12.41s/it] + +{'loss': 0.4498, 'learning_rate': 9.495346988650323e-06, 'epoch': 0.53} + + 53%|█████▎ | 3915/7378 [13:25:50<11:56:13, 12.41s/it] + 53%|█████▎ | 3916/7378 [13:26:02<11:48:52, 12.29s/it] + +{'loss': 0.4518, 'learning_rate': 9.490962479453478e-06, 'epoch': 0.53} + + 53%|█████▎ | 3916/7378 [13:26:02<11:48:52, 12.29s/it] + 53%|█████▎ | 3917/7378 [13:26:14<11:50:18, 12.31s/it] + +{'loss': 0.4331, 'learning_rate': 9.48657806836564e-06, 'epoch': 0.53} + + 53%|█████▎ | 3917/7378 [13:26:14<11:50:18, 12.31s/it] + 53%|█████▎ | 3918/7378 [13:26:26<11:51:47, 12.34s/it] + +{'loss': 0.4445, 'learning_rate': 9.48219375623183e-06, 'epoch': 0.53} + + 53%|█████▎ | 3918/7378 [13:26:26<11:51:47, 12.34s/it] + 53%|█████▎ | 3919/7378 [13:26:39<11:51:41, 12.34s/it] + +{'loss': 0.4635, 'learning_rate': 9.477809543897061e-06, 'epoch': 0.53} + + 53%|█████▎ | 3919/7378 [13:26:39<11:51:41, 12.34s/it] + 53%|█████▎ | 3920/7378 [13:26:51<11:49:11, 12.31s/it] + +{'loss': 0.4468, 'learning_rate': 9.473425432206315e-06, 'epoch': 0.53} + + 53%|█████▎ | 3920/7378 [13:26:51<11:49:11, 12.31s/it] + 53%|█████▎ | 3921/7378 [13:27:03<11:44:17, 12.22s/it] + +{'loss': 0.437, 'learning_rate': 9.46904142200457e-06, 'epoch': 0.53} + + 53%|█████▎ | 3921/7378 [13:27:03<11:44:17, 12.22s/it] + 53%|█████▎ | 3922/7378 [13:27:16<11:49:41, 12.32s/it] + +{'loss': 0.4322, 'learning_rate': 9.464657514136768e-06, 'epoch': 0.53} + + 53%|█████▎ | 3922/7378 [13:27:16<11:49:41, 12.32s/it] + 53%|█████▎ | 3923/7378 [13:27:28<11:48:38, 12.31s/it] + +{'loss': 0.4584, 'learning_rate': 9.460273709447838e-06, 'epoch': 0.53} + + 53%|█████▎ | 3923/7378 [13:27:28<11:48:38, 12.31s/it] + 53%|█████▎ | 3924/7378 [13:27:40<11:42:26, 12.20s/it] + +{'loss': 0.4888, 'learning_rate': 9.455890008782696e-06, 'epoch': 0.53} + + 53%|█████▎ | 3924/7378 [13:27:40<11:42:26, 12.20s/it] + 53%|█████▎ | 3925/7378 [13:27:52<11:40:04, 12.16s/it] + +{'loss': 0.4819, 'learning_rate': 9.451506412986223e-06, 'epoch': 0.53} + + 53%|█████▎ | 3925/7378 [13:27:52<11:40:04, 12.16s/it] + 53%|█████▎ | 3926/7378 [13:28:04<11:40:25, 12.17s/it] + +{'loss': 0.4183, 'learning_rate': 9.447122922903298e-06, 'epoch': 0.53} + + 53%|█████▎ | 3926/7378 [13:28:04<11:40:25, 12.17s/it] + 53%|█████▎ | 3927/7378 [13:28:16<11:40:40, 12.18s/it] + +{'loss': 0.397, 'learning_rate': 9.44273953937876e-06, 'epoch': 0.53} + + 53%|█████▎ | 3927/7378 [13:28:16<11:40:40, 12.18s/it] + 53%|█████▎ | 3928/7378 [13:28:28<11:40:42, 12.19s/it] + +{'loss': 0.5156, 'learning_rate': 9.438356263257446e-06, 'epoch': 0.53} + + 53%|█████▎ | 3928/7378 [13:28:28<11:40:42, 12.19s/it] + 53%|█████▎ | 3929/7378 [13:28:41<11:43:46, 12.24s/it] + +{'loss': 0.4466, 'learning_rate': 9.43397309538416e-06, 'epoch': 0.53} + + 53%|█████▎ | 3929/7378 [13:28:41<11:43:46, 12.24s/it] + 53%|█████▎ | 3930/7378 [13:28:53<11:48:21, 12.33s/it] + +{'loss': 0.4488, 'learning_rate': 9.429590036603688e-06, 'epoch': 0.53} + + 53%|█████▎ | 3930/7378 [13:28:53<11:48:21, 12.33s/it] + 53%|█████▎ | 3931/7378 [13:29:06<11:48:18, 12.33s/it] + +{'loss': 0.4516, 'learning_rate': 9.425207087760799e-06, 'epoch': 0.53} + + 53%|█████▎ | 3931/7378 [13:29:06<11:48:18, 12.33s/it] + 53%|█████▎ | 3932/7378 [13:29:18<11:42:06, 12.22s/it] + +{'loss': 0.4426, 'learning_rate': 9.420824249700234e-06, 'epoch': 0.53} + + 53%|█████▎ | 3932/7378 [13:29:18<11:42:06, 12.22s/it] + 53%|█████▎ | 3933/7378 [13:29:31<11:57:13, 12.49s/it] + +{'loss': 0.4408, 'learning_rate': 9.41644152326672e-06, 'epoch': 0.53} + + 53%|█████▎ | 3933/7378 [13:29:31<11:57:13, 12.49s/it] + 53%|█████▎ | 3934/7378 [13:29:43<11:50:39, 12.38s/it] + +{'loss': 0.4375, 'learning_rate': 9.412058909304956e-06, 'epoch': 0.53} + + 53%|█████▎ | 3934/7378 [13:29:43<11:50:39, 12.38s/it] + 53%|█████▎ | 3935/7378 [13:29:56<12:05:14, 12.64s/it] + +{'loss': 0.4747, 'learning_rate': 9.407676408659623e-06, 'epoch': 0.53} + + 53%|█████▎ | 3935/7378 [13:29:56<12:05:14, 12.64s/it] + 53%|█████▎ | 3936/7378 [13:30:09<12:03:07, 12.61s/it] + +{'loss': 0.4332, 'learning_rate': 9.40329402217538e-06, 'epoch': 0.53} + + 53%|█████▎ | 3936/7378 [13:30:09<12:03:07, 12.61s/it] + 53%|█████▎ | 3937/7378 [13:30:21<11:55:14, 12.47s/it] + +{'loss': 0.4454, 'learning_rate': 9.398911750696864e-06, 'epoch': 0.53} + + 53%|█████▎ | 3937/7378 [13:30:21<11:55:14, 12.47s/it] + 53%|█████▎ | 3938/7378 [13:30:34<12:00:17, 12.56s/it] + +{'loss': 0.4282, 'learning_rate': 9.394529595068686e-06, 'epoch': 0.53} + + 53%|█████▎ | 3938/7378 [13:30:34<12:00:17, 12.56s/it] + 53%|█████▎ | 3939/7378 [13:30:46<11:58:54, 12.54s/it] + +{'loss': 0.4878, 'learning_rate': 9.39014755613544e-06, 'epoch': 0.53} + + 53%|█████▎ | 3939/7378 [13:30:46<11:58:54, 12.54s/it] + 53%|█████▎ | 3940/7378 [13:30:58<11:52:42, 12.44s/it] + +{'loss': 0.4235, 'learning_rate': 9.385765634741696e-06, 'epoch': 0.53} + + 53%|█████▎ | 3940/7378 [13:30:58<11:52:42, 12.44s/it] + 53%|█████▎ | 3941/7378 [13:31:11<11:56:07, 12.50s/it] + +{'loss': 0.4736, 'learning_rate': 9.381383831731998e-06, 'epoch': 0.53} + + 53%|█████▎ | 3941/7378 [13:31:11<11:56:07, 12.50s/it] + 53%|█████▎ | 3942/7378 [13:31:23<11:54:21, 12.47s/it] + +{'loss': 0.4505, 'learning_rate': 9.377002147950875e-06, 'epoch': 0.53} + + 53%|█████▎ | 3942/7378 [13:31:23<11:54:21, 12.47s/it] + 53%|█████▎ | 3943/7378 [13:31:35<11:47:20, 12.36s/it] + +{'loss': 0.4137, 'learning_rate': 9.37262058424282e-06, 'epoch': 0.53} + + 53%|█████▎ | 3943/7378 [13:31:35<11:47:20, 12.36s/it] + 53%|█████▎ | 3944/7378 [13:31:48<11:42:53, 12.28s/it] + +{'loss': 0.5146, 'learning_rate': 9.36823914145232e-06, 'epoch': 0.53} + + 53%|█████▎ | 3944/7378 [13:31:48<11:42:53, 12.28s/it] + 53%|█████▎ | 3945/7378 [13:32:00<11:45:03, 12.32s/it] + +{'loss': 0.4444, 'learning_rate': 9.36385782042382e-06, 'epoch': 0.53} + + 53%|█████▎ | 3945/7378 [13:32:00<11:45:03, 12.32s/it] + 53%|█████▎ | 3946/7378 [13:32:12<11:43:42, 12.30s/it] + +{'loss': 0.4723, 'learning_rate': 9.35947662200176e-06, 'epoch': 0.53} + + 53%|█████▎ | 3946/7378 [13:32:12<11:43:42, 12.30s/it] + 53%|█████▎ | 3947/7378 [13:32:24<11:39:08, 12.23s/it] + +{'loss': 0.3711, 'learning_rate': 9.355095547030543e-06, 'epoch': 0.53} + + 53%|█████▎ | 3947/7378 [13:32:24<11:39:08, 12.23s/it] + 54%|█████▎ | 3948/7378 [13:32:36<11:37:15, 12.20s/it] + +{'loss': 0.4718, 'learning_rate': 9.35071459635455e-06, 'epoch': 0.54} + + 54%|█████▎ | 3948/7378 [13:32:36<11:37:15, 12.20s/it] + 54%|█████▎ | 3949/7378 [13:32:49<11:42:05, 12.29s/it] + +{'loss': 0.4469, 'learning_rate': 9.346333770818145e-06, 'epoch': 0.54} + + 54%|█████▎ | 3949/7378 [13:32:49<11:42:05, 12.29s/it] + 54%|█████▎ | 3950/7378 [13:33:01<11:45:31, 12.35s/it] + +{'loss': 0.5283, 'learning_rate': 9.341953071265659e-06, 'epoch': 0.54} + + 54%|█████▎ | 3950/7378 [13:33:01<11:45:31, 12.35s/it] + 54%|█████▎ | 3951/7378 [13:33:13<11:38:53, 12.24s/it] + +{'loss': 0.4987, 'learning_rate': 9.33757249854141e-06, 'epoch': 0.54} + + 54%|█████▎ | 3951/7378 [13:33:13<11:38:53, 12.24s/it] + 54%|█████▎ | 3952/7378 [13:33:26<11:44:36, 12.34s/it] + +{'loss': 0.4616, 'learning_rate': 9.333192053489675e-06, 'epoch': 0.54} + + 54%|█████▎ | 3952/7378 [13:33:26<11:44:36, 12.34s/it] + 54%|█████▎ | 3953/7378 [13:33:38<11:46:42, 12.38s/it] + +{'loss': 0.4887, 'learning_rate': 9.328811736954722e-06, 'epoch': 0.54} + + 54%|█████▎ | 3953/7378 [13:33:38<11:46:42, 12.38s/it] + 54%|█████▎ | 3954/7378 [13:33:50<11:41:01, 12.28s/it] + +{'loss': 0.4595, 'learning_rate': 9.324431549780792e-06, 'epoch': 0.54} + + 54%|█████▎ | 3954/7378 [13:33:50<11:41:01, 12.28s/it] + 54%|█████▎ | 3955/7378 [13:34:03<11:36:58, 12.22s/it] + +{'loss': 0.4333, 'learning_rate': 9.32005149281209e-06, 'epoch': 0.54} + + 54%|█████▎ | 3955/7378 [13:34:03<11:36:58, 12.22s/it] + 54%|█████▎ | 3956/7378 [13:34:15<11:38:42, 12.25s/it] + +{'loss': 0.4, 'learning_rate': 9.315671566892809e-06, 'epoch': 0.54} + + 54%|█████▎ | 3956/7378 [13:34:15<11:38:42, 12.25s/it] + 54%|█████▎ | 3957/7378 [13:34:27<11:41:44, 12.31s/it] + +{'loss': 0.4112, 'learning_rate': 9.31129177286711e-06, 'epoch': 0.54} + + 54%|█████▎ | 3957/7378 [13:34:27<11:41:44, 12.31s/it] + 54%|█████▎ | 3958/7378 [13:34:39<11:39:04, 12.26s/it] + +{'loss': 0.4335, 'learning_rate': 9.306912111579127e-06, 'epoch': 0.54} + + 54%|█████▎ | 3958/7378 [13:34:39<11:39:04, 12.26s/it] + 54%|█████▎ | 3959/7378 [13:34:52<11:41:10, 12.30s/it] + +{'loss': 0.4603, 'learning_rate': 9.302532583872974e-06, 'epoch': 0.54} + + 54%|█████▎ | 3959/7378 [13:34:52<11:41:10, 12.30s/it] + 54%|█████▎ | 3960/7378 [13:35:05<11:53:52, 12.53s/it] + +{'loss': 0.4241, 'learning_rate': 9.298153190592732e-06, 'epoch': 0.54} + + 54%|█████▎ | 3960/7378 [13:35:05<11:53:52, 12.53s/it] + 54%|█████▎ | 3961/7378 [13:35:17<11:51:44, 12.50s/it] + +{'loss': 0.4703, 'learning_rate': 9.293773932582467e-06, 'epoch': 0.54} + + 54%|█████▎ | 3961/7378 [13:35:17<11:51:44, 12.50s/it] + 54%|█████▎ | 3962/7378 [13:35:30<11:54:15, 12.55s/it] + +{'loss': 0.442, 'learning_rate': 9.289394810686206e-06, 'epoch': 0.54} + + 54%|█████▎ | 3962/7378 [13:35:30<11:54:15, 12.55s/it] + 54%|█████▎ | 3963/7378 [13:35:43<11:55:08, 12.56s/it] + +{'loss': 0.4543, 'learning_rate': 9.285015825747962e-06, 'epoch': 0.54} + + 54%|█████▎ | 3963/7378 [13:35:43<11:55:08, 12.56s/it] + 54%|█████▎ | 3964/7378 [13:35:55<11:46:03, 12.41s/it] + +{'loss': 0.4679, 'learning_rate': 9.280636978611712e-06, 'epoch': 0.54} + + 54%|█████▎ | 3964/7378 [13:35:55<11:46:03, 12.41s/it] + 54%|█████▎ | 3965/7378 [13:36:07<11:41:37, 12.33s/it] + +{'loss': 0.4582, 'learning_rate': 9.276258270121407e-06, 'epoch': 0.54} + + 54%|█████▎ | 3965/7378 [13:36:07<11:41:37, 12.33s/it] + 54%|█████▍ | 3966/7378 [13:36:19<11:38:59, 12.29s/it] + +{'loss': 0.4356, 'learning_rate': 9.271879701120981e-06, 'epoch': 0.54} + + 54%|█████▍ | 3966/7378 [13:36:19<11:38:59, 12.29s/it] + 54%|█████▍ | 3967/7378 [13:36:31<11:36:16, 12.25s/it] + +{'loss': 0.4158, 'learning_rate': 9.267501272454331e-06, 'epoch': 0.54} + + 54%|█████▍ | 3967/7378 [13:36:31<11:36:16, 12.25s/it] + 54%|█████▍ | 3968/7378 [13:36:44<11:47:50, 12.45s/it] + +{'loss': 0.4692, 'learning_rate': 9.263122984965332e-06, 'epoch': 0.54} + + 54%|█████▍ | 3968/7378 [13:36:44<11:47:50, 12.45s/it] + 54%|█████▍ | 3969/7378 [13:36:57<11:57:29, 12.63s/it] + +{'loss': 0.4498, 'learning_rate': 9.258744839497827e-06, 'epoch': 0.54} + + 54%|█████▍ | 3969/7378 [13:36:57<11:57:29, 12.63s/it] + 54%|█████▍ | 3970/7378 [13:37:10<11:54:22, 12.58s/it] + +{'loss': 0.5275, 'learning_rate': 9.254366836895641e-06, 'epoch': 0.54} + + 54%|█████▍ | 3970/7378 [13:37:10<11:54:22, 12.58s/it] + 54%|█████▍ | 3971/7378 [13:37:22<11:53:39, 12.57s/it] + +{'loss': 0.4258, 'learning_rate': 9.24998897800256e-06, 'epoch': 0.54} + + 54%|█████▍ | 3971/7378 [13:37:22<11:53:39, 12.57s/it] + 54%|█████▍ | 3972/7378 [13:37:35<11:52:43, 12.56s/it] + +{'loss': 0.4698, 'learning_rate': 9.245611263662351e-06, 'epoch': 0.54} + + 54%|█████▍ | 3972/7378 [13:37:35<11:52:43, 12.56s/it] + 54%|█████▍ | 3973/7378 [13:37:47<11:43:07, 12.39s/it] + +{'loss': 0.472, 'learning_rate': 9.241233694718748e-06, 'epoch': 0.54} + + 54%|█████▍ | 3973/7378 [13:37:47<11:43:07, 12.39s/it] + 54%|█████▍ | 3974/7378 [13:37:59<11:39:33, 12.33s/it] + +{'loss': 0.4738, 'learning_rate': 9.236856272015457e-06, 'epoch': 0.54} + + 54%|█████▍ | 3974/7378 [13:37:59<11:39:33, 12.33s/it] + 54%|█████▍ | 3975/7378 [13:38:11<11:44:37, 12.42s/it] + +{'loss': 0.3924, 'learning_rate': 9.232478996396162e-06, 'epoch': 0.54} + + 54%|█████▍ | 3975/7378 [13:38:11<11:44:37, 12.42s/it] + 54%|█████▍ | 3976/7378 [13:38:24<11:49:02, 12.51s/it] + +{'loss': 0.4112, 'learning_rate': 9.22810186870451e-06, 'epoch': 0.54} + + 54%|█████▍ | 3976/7378 [13:38:24<11:49:02, 12.51s/it] + 54%|█████▍ | 3977/7378 [13:38:37<11:49:38, 12.52s/it] + +{'loss': 0.4758, 'learning_rate': 9.223724889784128e-06, 'epoch': 0.54} + + 54%|█████▍ | 3977/7378 [13:38:37<11:49:38, 12.52s/it] + 54%|█████▍ | 3978/7378 [13:38:49<11:43:35, 12.42s/it] + +{'loss': 0.4237, 'learning_rate': 9.219348060478606e-06, 'epoch': 0.54} + + 54%|█████▍ | 3978/7378 [13:38:49<11:43:35, 12.42s/it] + 54%|█████▍ | 3979/7378 [13:39:01<11:40:13, 12.36s/it] + +{'loss': 0.4169, 'learning_rate': 9.214971381631514e-06, 'epoch': 0.54} + + 54%|█████▍ | 3979/7378 [13:39:01<11:40:13, 12.36s/it] + 54%|█████▍ | 3980/7378 [13:39:13<11:39:52, 12.36s/it] + +{'loss': 0.5075, 'learning_rate': 9.210594854086382e-06, 'epoch': 0.54} + + 54%|█████▍ | 3980/7378 [13:39:13<11:39:52, 12.36s/it] + 54%|█████▍ | 3981/7378 [13:39:25<11:33:50, 12.26s/it] + +{'loss': 0.4925, 'learning_rate': 9.206218478686724e-06, 'epoch': 0.54} + + 54%|█████▍ | 3981/7378 [13:39:26<11:33:50, 12.26s/it] + 54%|█████▍ | 3982/7378 [13:39:38<11:32:27, 12.23s/it] + +{'loss': 0.4691, 'learning_rate': 9.201842256276012e-06, 'epoch': 0.54} + + 54%|█████▍ | 3982/7378 [13:39:38<11:32:27, 12.23s/it] + 54%|█████▍ | 3983/7378 [13:39:51<11:46:36, 12.49s/it] + +{'loss': 0.4948, 'learning_rate': 9.197466187697697e-06, 'epoch': 0.54} + + 54%|█████▍ | 3983/7378 [13:39:51<11:46:36, 12.49s/it] + 54%|█████▍ | 3984/7378 [13:40:03<11:43:55, 12.44s/it] + +{'loss': 0.4772, 'learning_rate': 9.193090273795199e-06, 'epoch': 0.54} + + 54%|█████▍ | 3984/7378 [13:40:03<11:43:55, 12.44s/it] + 54%|█████▍ | 3985/7378 [13:40:15<11:42:30, 12.42s/it] + +{'loss': 0.4391, 'learning_rate': 9.188714515411902e-06, 'epoch': 0.54} + + 54%|█████▍ | 3985/7378 [13:40:15<11:42:30, 12.42s/it] + 54%|█████▍ | 3986/7378 [13:40:28<11:41:15, 12.40s/it] + +{'loss': 0.4511, 'learning_rate': 9.18433891339117e-06, 'epoch': 0.54} + + 54%|█████▍ | 3986/7378 [13:40:28<11:41:15, 12.40s/it] + 54%|█████▍ | 3987/7378 [13:40:41<11:47:16, 12.51s/it] + +{'loss': 0.421, 'learning_rate': 9.179963468576328e-06, 'epoch': 0.54} + + 54%|█████▍ | 3987/7378 [13:40:41<11:47:16, 12.51s/it] + 54%|█████▍ | 3988/7378 [13:40:53<11:41:33, 12.42s/it] + +{'loss': 0.4235, 'learning_rate': 9.175588181810678e-06, 'epoch': 0.54} + + 54%|█████▍ | 3988/7378 [13:40:53<11:41:33, 12.42s/it] + 54%|█████▍ | 3989/7378 [13:41:05<11:44:14, 12.47s/it] + +{'loss': 0.293, 'learning_rate': 9.171213053937486e-06, 'epoch': 0.54} + + 54%|█████▍ | 3989/7378 [13:41:05<11:44:14, 12.47s/it] + 54%|█████▍ | 3990/7378 [13:41:18<11:38:08, 12.36s/it] + +{'loss': 0.5825, 'learning_rate': 9.166838085799988e-06, 'epoch': 0.54} + + 54%|█████▍ | 3990/7378 [13:41:18<11:38:08, 12.36s/it] + 54%|█████▍ | 3991/7378 [13:41:30<11:34:49, 12.31s/it] + +{'loss': 0.4471, 'learning_rate': 9.162463278241395e-06, 'epoch': 0.54} + + 54%|█████▍ | 3991/7378 [13:41:30<11:34:49, 12.31s/it] + 54%|█████▍ | 3992/7378 [13:41:42<11:34:33, 12.31s/it] + +{'loss': 0.4425, 'learning_rate': 9.158088632104876e-06, 'epoch': 0.54} + + 54%|█████▍ | 3992/7378 [13:41:42<11:34:33, 12.31s/it] + 54%|█████▍ | 3993/7378 [13:41:54<11:33:45, 12.30s/it] + +{'loss': 0.3899, 'learning_rate': 9.15371414823358e-06, 'epoch': 0.54} + + 54%|█████▍ | 3993/7378 [13:41:54<11:33:45, 12.30s/it] + 54%|█████▍ | 3994/7378 [13:42:07<11:49:04, 12.57s/it] + +{'loss': 0.5227, 'learning_rate': 9.149339827470619e-06, 'epoch': 0.54} + + 54%|█████▍ | 3994/7378 [13:42:07<11:49:04, 12.57s/it] + 54%|█████▍ | 3995/7378 [13:42:20<11:43:04, 12.47s/it] + +{'loss': 0.4711, 'learning_rate': 9.144965670659075e-06, 'epoch': 0.54} + + 54%|█████▍ | 3995/7378 [13:42:20<11:43:04, 12.47s/it] + 54%|█████▍ | 3996/7378 [13:42:32<11:42:35, 12.46s/it] + +{'loss': 0.4709, 'learning_rate': 9.140591678641998e-06, 'epoch': 0.54} + + 54%|█████▍ | 3996/7378 [13:42:32<11:42:35, 12.46s/it] + 54%|█████▍ | 3997/7378 [13:42:45<11:46:12, 12.53s/it] + +{'loss': 0.4715, 'learning_rate': 9.136217852262404e-06, 'epoch': 0.54} + + 54%|█████▍ | 3997/7378 [13:42:45<11:46:12, 12.53s/it] + 54%|█████▍ | 3998/7378 [13:42:57<11:39:42, 12.42s/it] + +{'loss': 0.5327, 'learning_rate': 9.131844192363285e-06, 'epoch': 0.54} + + 54%|█████▍ | 3998/7378 [13:42:57<11:39:42, 12.42s/it] + 54%|█████▍ | 3999/7378 [13:43:10<11:54:18, 12.68s/it] + +{'loss': 0.4857, 'learning_rate': 9.127470699787594e-06, 'epoch': 0.54} + + 54%|█████▍ | 3999/7378 [13:43:10<11:54:18, 12.68s/it] + 54%|█████▍ | 4000/7378 [13:43:23<11:46:59, 12.56s/it] + +{'loss': 0.4803, 'learning_rate': 9.123097375378249e-06, 'epoch': 0.54} + + 54%|█████▍ | 4000/7378 [13:43:23<11:46:59, 12.56s/it] + 54%|█████▍ | 4001/7378 [13:43:35<11:45:59, 12.54s/it] + +{'loss': 0.4591, 'learning_rate': 9.118724219978143e-06, 'epoch': 0.54} + + 54%|█████▍ | 4001/7378 [13:43:35<11:45:59, 12.54s/it] + 54%|█████▍ | 4002/7378 [13:43:47<11:39:36, 12.43s/it] + +{'loss': 0.4416, 'learning_rate': 9.114351234430132e-06, 'epoch': 0.54} + + 54%|█████▍ | 4002/7378 [13:43:47<11:39:36, 12.43s/it] + 54%|█████▍ | 4003/7378 [13:44:00<11:41:18, 12.47s/it] + +{'loss': 0.4315, 'learning_rate': 9.109978419577044e-06, 'epoch': 0.54} + + 54%|█████▍ | 4003/7378 [13:44:00<11:41:18, 12.47s/it] + 54%|█████▍ | 4004/7378 [13:44:13<11:49:15, 12.61s/it] + +{'loss': 0.532, 'learning_rate': 9.105605776261664e-06, 'epoch': 0.54} + + 54%|█████▍ | 4004/7378 [13:44:13<11:49:15, 12.61s/it] + 54%|█████▍ | 4005/7378 [13:44:25<11:45:28, 12.55s/it] + +{'loss': 0.4679, 'learning_rate': 9.101233305326755e-06, 'epoch': 0.54} + + 54%|█████▍ | 4005/7378 [13:44:25<11:45:28, 12.55s/it] + 54%|█████▍ | 4006/7378 [13:44:37<11:39:58, 12.46s/it] + +{'loss': 0.4451, 'learning_rate': 9.09686100761504e-06, 'epoch': 0.54} + + 54%|█████▍ | 4006/7378 [13:44:37<11:39:58, 12.46s/it] + 54%|█████▍ | 4007/7378 [13:44:50<11:40:55, 12.48s/it] + +{'loss': 0.4088, 'learning_rate': 9.092488883969215e-06, 'epoch': 0.54} + + 54%|█████▍ | 4007/7378 [13:44:50<11:40:55, 12.48s/it] + 54%|█████▍ | 4008/7378 [13:45:03<11:44:01, 12.53s/it] + +{'loss': 0.4837, 'learning_rate': 9.088116935231936e-06, 'epoch': 0.54} + + 54%|█████▍ | 4008/7378 [13:45:03<11:44:01, 12.53s/it] + 54%|█████▍ | 4009/7378 [13:45:15<11:44:14, 12.54s/it] + +{'loss': 0.5042, 'learning_rate': 9.083745162245823e-06, 'epoch': 0.54} + + 54%|█████▍ | 4009/7378 [13:45:15<11:44:14, 12.54s/it] + 54%|█████▍ | 4010/7378 [13:45:27<11:32:16, 12.33s/it] + +{'loss': 0.4632, 'learning_rate': 9.079373565853473e-06, 'epoch': 0.54} + + 54%|█████▍ | 4010/7378 [13:45:27<11:32:16, 12.33s/it] + 54%|█████▍ | 4011/7378 [13:45:39<11:30:35, 12.31s/it] + +{'loss': 0.4564, 'learning_rate': 9.075002146897438e-06, 'epoch': 0.54} + + 54%|█████▍ | 4011/7378 [13:45:39<11:30:35, 12.31s/it] + 54%|█████▍ | 4012/7378 [13:45:52<11:30:41, 12.31s/it] + +{'loss': 0.4445, 'learning_rate': 9.070630906220246e-06, 'epoch': 0.54} + + 54%|█████▍ | 4012/7378 [13:45:52<11:30:41, 12.31s/it] + 54%|█████▍ | 4013/7378 [13:46:04<11:29:14, 12.29s/it] + +{'loss': 0.4104, 'learning_rate': 9.066259844664382e-06, 'epoch': 0.54} + + 54%|█████▍ | 4013/7378 [13:46:04<11:29:14, 12.29s/it] + 54%|█████▍ | 4014/7378 [13:46:16<11:24:46, 12.21s/it] + +{'loss': 0.4672, 'learning_rate': 9.061888963072298e-06, 'epoch': 0.54} + + 54%|█████▍ | 4014/7378 [13:46:16<11:24:46, 12.21s/it] + 54%|█████▍ | 4015/7378 [13:46:28<11:24:00, 12.20s/it] + +{'loss': 0.4826, 'learning_rate': 9.057518262286414e-06, 'epoch': 0.54} + + 54%|█████▍ | 4015/7378 [13:46:28<11:24:00, 12.20s/it] + 54%|█████▍ | 4016/7378 [13:46:41<11:28:59, 12.30s/it] + +{'loss': 0.435, 'learning_rate': 9.053147743149118e-06, 'epoch': 0.54} + + 54%|█████▍ | 4016/7378 [13:46:41<11:28:59, 12.30s/it] + 54%|█████▍ | 4017/7378 [13:46:53<11:23:37, 12.20s/it] + +{'loss': 0.5189, 'learning_rate': 9.048777406502754e-06, 'epoch': 0.54} + + 54%|█████▍ | 4017/7378 [13:46:53<11:23:37, 12.20s/it] + 54%|█████▍ | 4018/7378 [13:47:05<11:20:13, 12.15s/it] + +{'loss': 0.4265, 'learning_rate': 9.044407253189636e-06, 'epoch': 0.54} + + 54%|█████▍ | 4018/7378 [13:47:05<11:20:13, 12.15s/it] + 54%|█████▍ | 4019/7378 [13:47:17<11:29:48, 12.32s/it] + +{'loss': 0.4896, 'learning_rate': 9.040037284052046e-06, 'epoch': 0.54} + + 54%|█████▍ | 4019/7378 [13:47:17<11:29:48, 12.32s/it] + 54%|█████▍ | 4020/7378 [13:47:29<11:24:46, 12.24s/it] + +{'loss': 0.4514, 'learning_rate': 9.035667499932224e-06, 'epoch': 0.54} + + 54%|█████▍ | 4020/7378 [13:47:29<11:24:46, 12.24s/it] + 54%|█████▍ | 4021/7378 [13:47:42<11:32:01, 12.37s/it] + +{'loss': 0.46, 'learning_rate': 9.03129790167238e-06, 'epoch': 0.54} + + 54%|█████▍ | 4021/7378 [13:47:42<11:32:01, 12.37s/it] + 55%|█████▍ | 4022/7378 [13:47:54<11:25:13, 12.25s/it] + +{'loss': 0.4984, 'learning_rate': 9.026928490114683e-06, 'epoch': 0.55} + + 55%|█████▍ | 4022/7378 [13:47:54<11:25:13, 12.25s/it] + 55%|█████▍ | 4023/7378 [13:48:06<11:28:55, 12.32s/it] + +{'loss': 0.463, 'learning_rate': 9.02255926610127e-06, 'epoch': 0.55} + + 55%|█████▍ | 4023/7378 [13:48:06<11:28:55, 12.32s/it] + 55%|█████▍ | 4024/7378 [13:48:19<11:28:19, 12.31s/it] + +{'loss': 0.4137, 'learning_rate': 9.018190230474242e-06, 'epoch': 0.55} + + 55%|█████▍ | 4024/7378 [13:48:19<11:28:19, 12.31s/it] + 55%|█████▍ | 4025/7378 [13:48:31<11:24:15, 12.24s/it] + +{'loss': 0.5178, 'learning_rate': 9.01382138407566e-06, 'epoch': 0.55} + + 55%|█████▍ | 4025/7378 [13:48:31<11:24:15, 12.24s/it] + 55%|█████▍ | 4026/7378 [13:48:43<11:27:53, 12.31s/it] + +{'loss': 0.4293, 'learning_rate': 9.00945272774755e-06, 'epoch': 0.55} + + 55%|█████▍ | 4026/7378 [13:48:43<11:27:53, 12.31s/it] + 55%|█████▍ | 4027/7378 [13:48:56<11:29:24, 12.34s/it] + +{'loss': 0.4433, 'learning_rate': 9.005084262331902e-06, 'epoch': 0.55} + + 55%|█████▍ | 4027/7378 [13:48:56<11:29:24, 12.34s/it] + 55%|█████▍ | 4028/7378 [13:49:08<11:26:55, 12.30s/it] + +{'loss': 0.5185, 'learning_rate': 9.000715988670672e-06, 'epoch': 0.55} + + 55%|█████▍ | 4028/7378 [13:49:08<11:26:55, 12.30s/it] + 55%|█████▍ | 4029/7378 [13:49:20<11:31:15, 12.38s/it] + +{'loss': 0.4626, 'learning_rate': 8.996347907605773e-06, 'epoch': 0.55} + + 55%|█████▍ | 4029/7378 [13:49:20<11:31:15, 12.38s/it] + 55%|█████▍ | 4030/7378 [13:49:33<11:32:08, 12.40s/it] + +{'loss': 0.4158, 'learning_rate': 8.99198001997909e-06, 'epoch': 0.55} + + 55%|█████▍ | 4030/7378 [13:49:33<11:32:08, 12.40s/it] + 55%|█████▍ | 4031/7378 [13:49:46<11:41:34, 12.58s/it] + +{'loss': 0.439, 'learning_rate': 8.987612326632457e-06, 'epoch': 0.55} + + 55%|█████▍ | 4031/7378 [13:49:46<11:41:34, 12.58s/it] + 55%|█████▍ | 4032/7378 [13:49:58<11:38:56, 12.53s/it] + +{'loss': 0.4772, 'learning_rate': 8.983244828407683e-06, 'epoch': 0.55} + + 55%|█████▍ | 4032/7378 [13:49:58<11:38:56, 12.53s/it] + 55%|█████▍ | 4033/7378 [13:50:11<11:39:59, 12.56s/it] + +{'loss': 0.5028, 'learning_rate': 8.978877526146536e-06, 'epoch': 0.55} + + 55%|█████▍ | 4033/7378 [13:50:11<11:39:59, 12.56s/it] + 55%|█████▍ | 4034/7378 [13:50:23<11:36:23, 12.50s/it] + +{'loss': 0.4109, 'learning_rate': 8.97451042069074e-06, 'epoch': 0.55} + + 55%|█████▍ | 4034/7378 [13:50:23<11:36:23, 12.50s/it] + 55%|█████▍ | 4035/7378 [13:50:35<11:28:19, 12.35s/it] + +{'loss': 0.4335, 'learning_rate': 8.970143512881992e-06, 'epoch': 0.55} + + 55%|█████▍ | 4035/7378 [13:50:35<11:28:19, 12.35s/it] + 55%|█████▍ | 4036/7378 [13:50:47<11:19:41, 12.20s/it] + +{'loss': 0.5015, 'learning_rate': 8.965776803561942e-06, 'epoch': 0.55} + + 55%|█████▍ | 4036/7378 [13:50:47<11:19:41, 12.20s/it] + 55%|█████▍ | 4037/7378 [13:51:00<11:22:14, 12.25s/it] + +{'loss': 0.5038, 'learning_rate': 8.961410293572203e-06, 'epoch': 0.55} + + 55%|█████▍ | 4037/7378 [13:51:00<11:22:14, 12.25s/it] + 55%|█████▍ | 4038/7378 [13:51:12<11:25:38, 12.32s/it] + +{'loss': 0.5258, 'learning_rate': 8.957043983754355e-06, 'epoch': 0.55} + + 55%|█████▍ | 4038/7378 [13:51:12<11:25:38, 12.32s/it] + 55%|█████▍ | 4039/7378 [13:51:24<11:23:37, 12.28s/it] + +{'loss': 0.485, 'learning_rate': 8.952677874949934e-06, 'epoch': 0.55} + + 55%|█████▍ | 4039/7378 [13:51:24<11:23:37, 12.28s/it] + 55%|█████▍ | 4040/7378 [13:51:37<11:26:21, 12.34s/it] + +{'loss': 0.4446, 'learning_rate': 8.948311968000437e-06, 'epoch': 0.55} + + 55%|█████▍ | 4040/7378 [13:51:37<11:26:21, 12.34s/it] + 55%|█████▍ | 4041/7378 [13:51:49<11:23:35, 12.29s/it] + +{'loss': 0.4095, 'learning_rate': 8.943946263747327e-06, 'epoch': 0.55} + + 55%|█████▍ | 4041/7378 [13:51:49<11:23:35, 12.29s/it] + 55%|█████▍ | 4042/7378 [13:52:02<11:30:16, 12.41s/it] + +{'loss': 0.4629, 'learning_rate': 8.939580763032026e-06, 'epoch': 0.55} + + 55%|█████▍ | 4042/7378 [13:52:02<11:30:16, 12.41s/it] + 55%|█████▍ | 4043/7378 [13:52:14<11:23:33, 12.30s/it] + +{'loss': 0.4268, 'learning_rate': 8.935215466695916e-06, 'epoch': 0.55} + + 55%|█████▍ | 4043/7378 [13:52:14<11:23:33, 12.30s/it] + 55%|█████▍ | 4044/7378 [13:52:26<11:23:45, 12.31s/it] + +{'loss': 0.3831, 'learning_rate': 8.930850375580336e-06, 'epoch': 0.55} + + 55%|█████▍ | 4044/7378 [13:52:26<11:23:45, 12.31s/it] + 55%|█████▍ | 4045/7378 [13:52:38<11:26:06, 12.35s/it] + +{'loss': 0.5161, 'learning_rate': 8.92648549052659e-06, 'epoch': 0.55} + + 55%|█████▍ | 4045/7378 [13:52:38<11:26:06, 12.35s/it] + 55%|█████▍ | 4046/7378 [13:52:50<11:21:07, 12.27s/it] + +{'loss': 0.4454, 'learning_rate': 8.922120812375942e-06, 'epoch': 0.55} + + 55%|█████▍ | 4046/7378 [13:52:50<11:21:07, 12.27s/it] + 55%|█████▍ | 4047/7378 [13:53:03<11:25:26, 12.35s/it] + +{'loss': 0.4904, 'learning_rate': 8.917756341969618e-06, 'epoch': 0.55} + + 55%|█████▍ | 4047/7378 [13:53:03<11:25:26, 12.35s/it] + 55%|█████▍ | 4048/7378 [13:53:16<11:28:33, 12.41s/it] + +{'loss': 0.452, 'learning_rate': 8.913392080148795e-06, 'epoch': 0.55} + + 55%|█████▍ | 4048/7378 [13:53:16<11:28:33, 12.41s/it] + 55%|█████▍ | 4049/7378 [13:53:28<11:32:08, 12.47s/it] + +{'loss': 0.4552, 'learning_rate': 8.909028027754622e-06, 'epoch': 0.55} + + 55%|█████▍ | 4049/7378 [13:53:28<11:32:08, 12.47s/it] + 55%|█████▍ | 4050/7378 [13:53:41<11:30:58, 12.46s/it] + +{'loss': 0.3987, 'learning_rate': 8.9046641856282e-06, 'epoch': 0.55} + + 55%|█████▍ | 4050/7378 [13:53:41<11:30:58, 12.46s/it] + 55%|█████▍ | 4051/7378 [13:53:53<11:28:35, 12.42s/it] + +{'loss': 0.4514, 'learning_rate': 8.900300554610587e-06, 'epoch': 0.55} + + 55%|█████▍ | 4051/7378 [13:53:53<11:28:35, 12.42s/it] + 55%|█████▍ | 4052/7378 [13:54:05<11:25:21, 12.36s/it] + +{'loss': 0.4357, 'learning_rate': 8.895937135542812e-06, 'epoch': 0.55} + + 55%|█████▍ | 4052/7378 [13:54:05<11:25:21, 12.36s/it] + 55%|█████▍ | 4053/7378 [13:54:17<11:24:23, 12.35s/it] + +{'loss': 0.4175, 'learning_rate': 8.891573929265848e-06, 'epoch': 0.55} + + 55%|█████▍ | 4053/7378 [13:54:17<11:24:23, 12.35s/it] + 55%|█████▍ | 4054/7378 [13:54:30<11:25:28, 12.37s/it] + +{'loss': 0.4427, 'learning_rate': 8.88721093662064e-06, 'epoch': 0.55} + + 55%|█████▍ | 4054/7378 [13:54:30<11:25:28, 12.37s/it] + 55%|█████▍ | 4055/7378 [13:54:42<11:19:36, 12.27s/it] + +{'loss': 0.4279, 'learning_rate': 8.882848158448084e-06, 'epoch': 0.55} + + 55%|█████▍ | 4055/7378 [13:54:42<11:19:36, 12.27s/it] + 55%|█████▍ | 4056/7378 [13:54:54<11:16:32, 12.22s/it] + +{'loss': 0.4239, 'learning_rate': 8.878485595589039e-06, 'epoch': 0.55} + + 55%|█████▍ | 4056/7378 [13:54:54<11:16:32, 12.22s/it] + 55%|█████▍ | 4057/7378 [13:55:06<11:16:52, 12.23s/it] + +{'loss': 0.3924, 'learning_rate': 8.874123248884318e-06, 'epoch': 0.55} + + 55%|█████▍ | 4057/7378 [13:55:06<11:16:52, 12.23s/it] + 55%|█████▌ | 4058/7378 [13:55:19<11:17:37, 12.25s/it] + +{'loss': 0.5271, 'learning_rate': 8.869761119174697e-06, 'epoch': 0.55} + + 55%|█████▌ | 4058/7378 [13:55:19<11:17:37, 12.25s/it] + 55%|█████▌ | 4059/7378 [13:55:31<11:15:35, 12.21s/it] + +{'loss': 0.4402, 'learning_rate': 8.86539920730091e-06, 'epoch': 0.55} + + 55%|█████▌ | 4059/7378 [13:55:31<11:15:35, 12.21s/it] + 55%|█████▌ | 4060/7378 [13:55:43<11:16:52, 12.24s/it] + +{'loss': 0.4567, 'learning_rate': 8.86103751410364e-06, 'epoch': 0.55} + + 55%|█████▌ | 4060/7378 [13:55:43<11:16:52, 12.24s/it] + 55%|█████▌ | 4061/7378 [13:55:55<11:15:39, 12.22s/it] + +{'loss': 0.4759, 'learning_rate': 8.856676040423543e-06, 'epoch': 0.55} + + 55%|█████▌ | 4061/7378 [13:55:55<11:15:39, 12.22s/it] + 55%|█████▌ | 4062/7378 [13:56:08<11:21:01, 12.32s/it] + +{'loss': 0.4512, 'learning_rate': 8.852314787101219e-06, 'epoch': 0.55} + + 55%|█████▌ | 4062/7378 [13:56:08<11:21:01, 12.32s/it] + 55%|█████▌ | 4063/7378 [13:56:20<11:20:53, 12.32s/it] + +{'loss': 0.5017, 'learning_rate': 8.847953754977236e-06, 'epoch': 0.55} + + 55%|█████▌ | 4063/7378 [13:56:20<11:20:53, 12.32s/it] + 55%|█████▌ | 4064/7378 [13:56:32<11:20:41, 12.32s/it] + +{'loss': 0.5242, 'learning_rate': 8.84359294489211e-06, 'epoch': 0.55} + + 55%|█████▌ | 4064/7378 [13:56:32<11:20:41, 12.32s/it] + 55%|█████▌ | 4065/7378 [13:56:45<11:17:05, 12.26s/it] + +{'loss': 0.4386, 'learning_rate': 8.839232357686322e-06, 'epoch': 0.55} + + 55%|█████▌ | 4065/7378 [13:56:45<11:17:05, 12.26s/it] + 55%|█████▌ | 4066/7378 [13:56:57<11:20:17, 12.32s/it] + +{'loss': 0.4605, 'learning_rate': 8.834871994200305e-06, 'epoch': 0.55} + + 55%|█████▌ | 4066/7378 [13:56:57<11:20:17, 12.32s/it] + 55%|█████▌ | 4067/7378 [13:57:10<11:25:39, 12.43s/it] + +{'loss': 0.4592, 'learning_rate': 8.830511855274454e-06, 'epoch': 0.55} + + 55%|█████▌ | 4067/7378 [13:57:10<11:25:39, 12.43s/it] + 55%|█████▌ | 4068/7378 [13:57:22<11:25:01, 12.42s/it] + +{'loss': 0.4531, 'learning_rate': 8.826151941749115e-06, 'epoch': 0.55} + + 55%|█████▌ | 4068/7378 [13:57:22<11:25:01, 12.42s/it] + 55%|█████▌ | 4069/7378 [13:57:34<11:24:20, 12.41s/it] + +{'loss': 0.4891, 'learning_rate': 8.82179225446459e-06, 'epoch': 0.55} + + 55%|█████▌ | 4069/7378 [13:57:34<11:24:20, 12.41s/it] + 55%|█████▌ | 4070/7378 [13:57:47<11:21:52, 12.37s/it] + +{'loss': 0.4001, 'learning_rate': 8.817432794261145e-06, 'epoch': 0.55} + + 55%|█████▌ | 4070/7378 [13:57:47<11:21:52, 12.37s/it] + 55%|█████▌ | 4071/7378 [13:57:59<11:15:24, 12.25s/it] + +{'loss': 0.4278, 'learning_rate': 8.813073561978996e-06, 'epoch': 0.55} + + 55%|█████▌ | 4071/7378 [13:57:59<11:15:24, 12.25s/it] + 55%|█████▌ | 4072/7378 [13:58:11<11:11:23, 12.19s/it] + +{'loss': 0.4054, 'learning_rate': 8.808714558458318e-06, 'epoch': 0.55} + + 55%|█████▌ | 4072/7378 [13:58:11<11:11:23, 12.19s/it] + 55%|█████▌ | 4073/7378 [13:58:23<11:18:09, 12.31s/it] + +{'loss': 0.4317, 'learning_rate': 8.804355784539236e-06, 'epoch': 0.55} + + 55%|█████▌ | 4073/7378 [13:58:23<11:18:09, 12.31s/it] + 55%|█████▌ | 4074/7378 [13:58:36<11:15:57, 12.28s/it] + +{'loss': 0.4143, 'learning_rate': 8.799997241061844e-06, 'epoch': 0.55} + + 55%|█████▌ | 4074/7378 [13:58:36<11:15:57, 12.28s/it] + 55%|█████▌ | 4075/7378 [13:58:48<11:14:38, 12.25s/it] + +{'loss': 0.4194, 'learning_rate': 8.795638928866174e-06, 'epoch': 0.55} + + 55%|█████▌ | 4075/7378 [13:58:48<11:14:38, 12.25s/it] + 55%|█████▌ | 4076/7378 [13:59:00<11:14:43, 12.26s/it] + +{'loss': 0.4807, 'learning_rate': 8.791280848792227e-06, 'epoch': 0.55} + + 55%|█████▌ | 4076/7378 [13:59:00<11:14:43, 12.26s/it] + 55%|█████▌ | 4077/7378 [13:59:12<11:17:35, 12.32s/it] + +{'loss': 0.457, 'learning_rate': 8.786923001679953e-06, 'epoch': 0.55} + + 55%|█████▌ | 4077/7378 [13:59:12<11:17:35, 12.32s/it] + 55%|█████▌ | 4078/7378 [13:59:25<11:19:55, 12.36s/it] + +{'loss': 0.4953, 'learning_rate': 8.78256538836926e-06, 'epoch': 0.55} + + 55%|█████▌ | 4078/7378 [13:59:25<11:19:55, 12.36s/it] + 55%|█████▌ | 4079/7378 [13:59:37<11:16:39, 12.31s/it] + +{'loss': 0.4024, 'learning_rate': 8.778208009700008e-06, 'epoch': 0.55} + + 55%|█████▌ | 4079/7378 [13:59:37<11:16:39, 12.31s/it] + 55%|█████▌ | 4080/7378 [13:59:49<11:13:52, 12.26s/it] + +{'loss': 0.4728, 'learning_rate': 8.773850866512016e-06, 'epoch': 0.55} + + 55%|█████▌ | 4080/7378 [13:59:49<11:13:52, 12.26s/it] + 55%|█████▌ | 4081/7378 [14:00:02<11:19:00, 12.36s/it] + +{'loss': 0.4766, 'learning_rate': 8.769493959645055e-06, 'epoch': 0.55} + + 55%|█████▌ | 4081/7378 [14:00:02<11:19:00, 12.36s/it] + 55%|█████▌ | 4082/7378 [14:00:14<11:20:22, 12.39s/it] + +{'loss': 0.4605, 'learning_rate': 8.765137289938846e-06, 'epoch': 0.55} + + 55%|█████▌ | 4082/7378 [14:00:14<11:20:22, 12.39s/it] + 55%|█████▌ | 4083/7378 [14:00:26<11:11:07, 12.22s/it] + +{'loss': 0.5217, 'learning_rate': 8.760780858233074e-06, 'epoch': 0.55} + + 55%|█████▌ | 4083/7378 [14:00:26<11:11:07, 12.22s/it] + 55%|█████▌ | 4084/7378 [14:00:38<11:11:35, 12.23s/it] + +{'loss': 0.4799, 'learning_rate': 8.756424665367367e-06, 'epoch': 0.55} + + 55%|█████▌ | 4084/7378 [14:00:38<11:11:35, 12.23s/it] + 55%|█████▌ | 4085/7378 [14:00:51<11:10:18, 12.21s/it] + +{'loss': 0.4151, 'learning_rate': 8.75206871218132e-06, 'epoch': 0.55} + + 55%|█████▌ | 4085/7378 [14:00:51<11:10:18, 12.21s/it] + 55%|█████▌ | 4086/7378 [14:01:03<11:19:55, 12.39s/it] + +{'loss': 0.4764, 'learning_rate': 8.747712999514472e-06, 'epoch': 0.55} + + 55%|█████▌ | 4086/7378 [14:01:03<11:19:55, 12.39s/it] + 55%|█████▌ | 4087/7378 [14:01:16<11:17:09, 12.35s/it] + +{'loss': 0.4585, 'learning_rate': 8.74335752820632e-06, 'epoch': 0.55} + + 55%|█████▌ | 4087/7378 [14:01:16<11:17:09, 12.35s/it] + 55%|█████▌ | 4088/7378 [14:01:28<11:21:31, 12.43s/it] + +{'loss': 0.4544, 'learning_rate': 8.739002299096305e-06, 'epoch': 0.55} + + 55%|█████▌ | 4088/7378 [14:01:28<11:21:31, 12.43s/it] + 55%|█████▌ | 4089/7378 [14:01:41<11:19:29, 12.40s/it] + +{'loss': 0.4348, 'learning_rate': 8.734647313023839e-06, 'epoch': 0.55} + + 55%|█████▌ | 4089/7378 [14:01:41<11:19:29, 12.40s/it] + 55%|█████▌ | 4090/7378 [14:01:53<11:19:57, 12.41s/it] + +{'loss': 0.4679, 'learning_rate': 8.730292570828271e-06, 'epoch': 0.55} + + 55%|█████▌ | 4090/7378 [14:01:53<11:19:57, 12.41s/it] + 55%|█████▌ | 4091/7378 [14:02:05<11:11:08, 12.25s/it] + +{'loss': 0.4567, 'learning_rate': 8.725938073348916e-06, 'epoch': 0.55} + + 55%|█████▌ | 4091/7378 [14:02:05<11:11:08, 12.25s/it] + 55%|█████▌ | 4092/7378 [14:02:17<11:07:53, 12.20s/it] + +{'loss': 0.4368, 'learning_rate': 8.721583821425025e-06, 'epoch': 0.55} + + 55%|█████▌ | 4092/7378 [14:02:17<11:07:53, 12.20s/it] + 55%|█████▌ | 4093/7378 [14:02:29<11:08:38, 12.21s/it] + +{'loss': 0.4409, 'learning_rate': 8.71722981589582e-06, 'epoch': 0.55} + + 55%|█████▌ | 4093/7378 [14:02:29<11:08:38, 12.21s/it] + 55%|█████▌ | 4094/7378 [14:02:41<11:09:45, 12.24s/it] + +{'loss': 0.4777, 'learning_rate': 8.712876057600467e-06, 'epoch': 0.55} + + 55%|█████▌ | 4094/7378 [14:02:41<11:09:45, 12.24s/it] + 56%|█████▌ | 4095/7378 [14:02:54<11:22:23, 12.47s/it] + +{'loss': 0.4686, 'learning_rate': 8.70852254737808e-06, 'epoch': 0.56} + + 56%|█████▌ | 4095/7378 [14:02:54<11:22:23, 12.47s/it] + 56%|█████▌ | 4096/7378 [14:03:07<11:23:39, 12.50s/it] + +{'loss': 0.4167, 'learning_rate': 8.704169286067733e-06, 'epoch': 0.56} + + 56%|█████▌ | 4096/7378 [14:03:07<11:23:39, 12.50s/it] + 56%|█████▌ | 4097/7378 [14:03:19<11:16:19, 12.37s/it] + +{'loss': 0.4198, 'learning_rate': 8.699816274508446e-06, 'epoch': 0.56} + + 56%|█████▌ | 4097/7378 [14:03:19<11:16:19, 12.37s/it] + 56%|█████▌ | 4098/7378 [14:03:32<11:22:10, 12.48s/it] + +{'loss': 0.5102, 'learning_rate': 8.6954635135392e-06, 'epoch': 0.56} + + 56%|█████▌ | 4098/7378 [14:03:32<11:22:10, 12.48s/it] + 56%|█████▌ | 4099/7378 [14:03:44<11:18:52, 12.42s/it] + +{'loss': 0.4737, 'learning_rate': 8.691111003998913e-06, 'epoch': 0.56} + + 56%|█████▌ | 4099/7378 [14:03:44<11:18:52, 12.42s/it] + 56%|█████▌ | 4100/7378 [14:03:57<11:22:30, 12.49s/it] + +{'loss': 0.4746, 'learning_rate': 8.686758746726472e-06, 'epoch': 0.56} + + 56%|████��▌ | 4100/7378 [14:03:57<11:22:30, 12.49s/it] + 56%|█████▌ | 4101/7378 [14:04:09<11:15:05, 12.36s/it] + +{'loss': 0.4575, 'learning_rate': 8.682406742560698e-06, 'epoch': 0.56} + + 56%|█████▌ | 4101/7378 [14:04:09<11:15:05, 12.36s/it] + 56%|█████▌ | 4102/7378 [14:04:21<11:13:10, 12.33s/it] + +{'loss': 0.4716, 'learning_rate': 8.678054992340379e-06, 'epoch': 0.56} + + 56%|█████▌ | 4102/7378 [14:04:21<11:13:10, 12.33s/it] + 56%|█████▌ | 4103/7378 [14:04:34<11:16:56, 12.40s/it] + +{'loss': 0.4146, 'learning_rate': 8.673703496904243e-06, 'epoch': 0.56} + + 56%|█████▌ | 4103/7378 [14:04:34<11:16:56, 12.40s/it] + 56%|█████▌ | 4104/7378 [14:04:46<11:21:21, 12.49s/it] + +{'loss': 0.4191, 'learning_rate': 8.669352257090968e-06, 'epoch': 0.56} + + 56%|█████▌ | 4104/7378 [14:04:46<11:21:21, 12.49s/it] + 56%|█████▌ | 4105/7378 [14:04:59<11:18:32, 12.44s/it] + +{'loss': 0.3656, 'learning_rate': 8.665001273739197e-06, 'epoch': 0.56} + + 56%|█████▌ | 4105/7378 [14:04:59<11:18:32, 12.44s/it] + 56%|█████▌ | 4106/7378 [14:05:11<11:11:48, 12.32s/it] + +{'loss': 0.4989, 'learning_rate': 8.660650547687506e-06, 'epoch': 0.56} + + 56%|█████▌ | 4106/7378 [14:05:11<11:11:48, 12.32s/it] + 56%|█████▌ | 4107/7378 [14:05:23<11:13:37, 12.36s/it] + +{'loss': 0.3379, 'learning_rate': 8.656300079774432e-06, 'epoch': 0.56} + + 56%|█████▌ | 4107/7378 [14:05:23<11:13:37, 12.36s/it] + 56%|█████▌ | 4108/7378 [14:05:35<11:09:10, 12.28s/it] + +{'loss': 0.5012, 'learning_rate': 8.65194987083846e-06, 'epoch': 0.56} + + 56%|█████▌ | 4108/7378 [14:05:35<11:09:10, 12.28s/it] + 56%|█████▌ | 4109/7378 [14:05:48<11:11:13, 12.32s/it] + +{'loss': 0.4746, 'learning_rate': 8.647599921718025e-06, 'epoch': 0.56} + + 56%|█████▌ | 4109/7378 [14:05:48<11:11:13, 12.32s/it] + 56%|█████▌ | 4110/7378 [14:06:00<11:17:51, 12.45s/it] + +{'loss': 0.4447, 'learning_rate': 8.64325023325151e-06, 'epoch': 0.56} + + 56%|█████▌ | 4110/7378 [14:06:00<11:17:51, 12.45s/it] + 56%|█████▌ | 4111/7378 [14:06:13<11:20:24, 12.50s/it] + +{'loss': 0.4885, 'learning_rate': 8.63890080627725e-06, 'epoch': 0.56} + + 56%|█████▌ | 4111/7378 [14:06:13<11:20:24, 12.50s/it] + 56%|█████▌ | 4112/7378 [14:06:26<11:21:59, 12.53s/it] + +{'loss': 0.4566, 'learning_rate': 8.63455164163353e-06, 'epoch': 0.56} + + 56%|█████▌ | 4112/7378 [14:06:26<11:21:59, 12.53s/it] + 56%|█████▌ | 4113/7378 [14:06:38<11:22:57, 12.55s/it] + +{'loss': 0.4384, 'learning_rate': 8.63020274015858e-06, 'epoch': 0.56} + + 56%|█████▌ | 4113/7378 [14:06:38<11:22:57, 12.55s/it] + 56%|█████▌ | 4114/7378 [14:06:50<11:16:09, 12.43s/it] + +{'loss': 0.4487, 'learning_rate': 8.625854102690587e-06, 'epoch': 0.56} + + 56%|█████▌ | 4114/7378 [14:06:50<11:16:09, 12.43s/it] + 56%|█████▌ | 4115/7378 [14:07:03<11:20:37, 12.52s/it] + +{'loss': 0.4591, 'learning_rate': 8.621505730067678e-06, 'epoch': 0.56} + + 56%|█████▌ | 4115/7378 [14:07:03<11:20:37, 12.52s/it] + 56%|█████▌ | 4116/7378 [14:07:16<11:25:46, 12.61s/it] + +{'loss': 0.4426, 'learning_rate': 8.617157623127938e-06, 'epoch': 0.56} + + 56%|█████▌ | 4116/7378 [14:07:16<11:25:46, 12.61s/it] + 56%|█████▌ | 4117/7378 [14:07:29<11:27:19, 12.65s/it] + +{'loss': 0.4482, 'learning_rate': 8.612809782709394e-06, 'epoch': 0.56} + + 56%|█████▌ | 4117/7378 [14:07:29<11:27:19, 12.65s/it] + 56%|█████▌ | 4118/7378 [14:07:41<11:21:47, 12.55s/it] + +{'loss': 0.4316, 'learning_rate': 8.608462209650026e-06, 'epoch': 0.56} + + 56%|█████▌ | 4118/7378 [14:07:41<11:21:47, 12.55s/it] + 56%|█████▌ | 4119/7378 [14:07:53<11:18:35, 12.49s/it] + +{'loss': 0.3376, 'learning_rate': 8.60411490478776e-06, 'epoch': 0.56} + + 56%|█████▌ | 4119/7378 [14:07:53<11:18:35, 12.49s/it] + 56%|█████▌ | 4120/7378 [14:08:05<11:10:44, 12.35s/it] + +{'loss': 0.45, 'learning_rate': 8.599767868960467e-06, 'epoch': 0.56} + + 56%|█████▌ | 4120/7378 [14:08:05<11:10:44, 12.35s/it] + 56%|█████▌ | 4121/7378 [14:08:18<11:13:33, 12.41s/it] + +{'loss': 0.4594, 'learning_rate': 8.595421103005976e-06, 'epoch': 0.56} + + 56%|█████▌ | 4121/7378 [14:08:18<11:13:33, 12.41s/it] + 56%|█████▌ | 4122/7378 [14:08:30<11:14:03, 12.42s/it] + +{'loss': 0.4, 'learning_rate': 8.591074607762054e-06, 'epoch': 0.56} + + 56%|█████▌ | 4122/7378 [14:08:30<11:14:03, 12.42s/it] + 56%|█████▌ | 4123/7378 [14:08:43<11:11:27, 12.38s/it] + +{'loss': 0.4347, 'learning_rate': 8.586728384066421e-06, 'epoch': 0.56} + + 56%|█████▌ | 4123/7378 [14:08:43<11:11:27, 12.38s/it] + 56%|█████▌ | 4124/7378 [14:08:55<11:13:41, 12.42s/it] + +{'loss': 0.4433, 'learning_rate': 8.582382432756743e-06, 'epoch': 0.56} + + 56%|█████▌ | 4124/7378 [14:08:55<11:13:41, 12.42s/it] + 56%|█████▌ | 4125/7378 [14:09:08<11:14:26, 12.44s/it] + +{'loss': 0.4149, 'learning_rate': 8.578036754670635e-06, 'epoch': 0.56} + + 56%|█████▌ | 4125/7378 [14:09:08<11:14:26, 12.44s/it] + 56%|█████▌ | 4126/7378 [14:09:20<11:12:15, 12.40s/it] + +{'loss': 0.4472, 'learning_rate': 8.57369135064566e-06, 'epoch': 0.56} + + 56%|█████▌ | 4126/7378 [14:09:20<11:12:15, 12.40s/it] + 56%|█████▌ | 4127/7378 [14:09:32<11:14:05, 12.44s/it] + +{'loss': 0.4591, 'learning_rate': 8.569346221519323e-06, 'epoch': 0.56} + + 56%|█████▌ | 4127/7378 [14:09:32<11:14:05, 12.44s/it] + 56%|█████▌ | 4128/7378 [14:09:44<11:06:29, 12.30s/it] + +{'loss': 0.4441, 'learning_rate': 8.565001368129077e-06, 'epoch': 0.56} + + 56%|█████▌ | 4128/7378 [14:09:44<11:06:29, 12.30s/it] + 56%|█████▌ | 4129/7378 [14:09:57<11:08:02, 12.34s/it] + +{'loss': 0.4282, 'learning_rate': 8.560656791312332e-06, 'epoch': 0.56} + + 56%|█████▌ | 4129/7378 [14:09:57<11:08:02, 12.34s/it] + 56%|█████▌ | 4130/7378 [14:10:09<11:04:03, 12.27s/it] + +{'loss': 0.4698, 'learning_rate': 8.556312491906433e-06, 'epoch': 0.56} + + 56%|█████▌ | 4130/7378 [14:10:09<11:04:03, 12.27s/it] + 56%|█████▌ | 4131/7378 [14:10:21<11:04:19, 12.28s/it] + +{'loss': 0.4618, 'learning_rate': 8.551968470748679e-06, 'epoch': 0.56} + + 56%|█████▌ | 4131/7378 [14:10:21<11:04:19, 12.28s/it] + 56%|█████▌ | 4132/7378 [14:10:34<11:04:08, 12.28s/it] + +{'loss': 0.4489, 'learning_rate': 8.547624728676307e-06, 'epoch': 0.56} + + 56%|█████▌ | 4132/7378 [14:10:34<11:04:08, 12.28s/it] + 56%|█████▌ | 4133/7378 [14:10:46<11:11:42, 12.42s/it] + +{'loss': 0.3961, 'learning_rate': 8.543281266526508e-06, 'epoch': 0.56} + + 56%|█████▌ | 4133/7378 [14:10:46<11:11:42, 12.42s/it] + 56%|█████▌ | 4134/7378 [14:10:59<11:12:39, 12.44s/it] + +{'loss': 0.5081, 'learning_rate': 8.538938085136413e-06, 'epoch': 0.56} + + 56%|█████▌ | 4134/7378 [14:10:59<11:12:39, 12.44s/it] + 56%|█████▌ | 4135/7378 [14:11:11<11:12:13, 12.44s/it] + +{'loss': 0.4474, 'learning_rate': 8.534595185343109e-06, 'epoch': 0.56} + + 56%|█████▌ | 4135/7378 [14:11:11<11:12:13, 12.44s/it] + 56%|█████▌ | 4136/7378 [14:11:24<11:14:01, 12.47s/it] + +{'loss': 0.4719, 'learning_rate': 8.530252567983615e-06, 'epoch': 0.56} + + 56%|█████▌ | 4136/7378 [14:11:24<11:14:01, 12.47s/it] + 56%|█████▌ | 4137/7378 [14:11:36<11:09:24, 12.39s/it] + +{'loss': 0.4104, 'learning_rate': 8.525910233894906e-06, 'epoch': 0.56} + + 56%|█████▌ | 4137/7378 [14:11:36<11:09:24, 12.39s/it] + 56%|█████▌ | 4138/7378 [14:11:48<11:05:42, 12.33s/it] + +{'loss': 0.4835, 'learning_rate': 8.521568183913898e-06, 'epoch': 0.56} + + 56%|█████▌ | 4138/7378 [14:11:48<11:05:42, 12.33s/it] + 56%|█████▌ | 4139/7378 [14:12:00<11:05:18, 12.32s/it] + +{'loss': 0.451, 'learning_rate': 8.517226418877452e-06, 'epoch': 0.56} + + 56%|█████▌ | 4139/7378 [14:12:00<11:05:18, 12.32s/it] + 56%|█████▌ | 4140/7378 [14:12:13<11:04:45, 12.32s/it] + +{'loss': 0.4204, 'learning_rate': 8.512884939622377e-06, 'epoch': 0.56} + + 56%|█████▌ | 4140/7378 [14:12:13<11:04:45, 12.32s/it] + 56%|█████▌ | 4141/7378 [14:12:25<11:07:22, 12.37s/it] + +{'loss': 0.4138, 'learning_rate': 8.508543746985423e-06, 'epoch': 0.56} + + 56%|█████▌ | 4141/7378 [14:12:25<11:07:22, 12.37s/it] + 56%|█████▌ | 4142/7378 [14:12:38<11:08:32, 12.40s/it] + +{'loss': 0.4735, 'learning_rate': 8.50420284180329e-06, 'epoch': 0.56} + + 56%|█████▌ | 4142/7378 [14:12:38<11:08:32, 12.40s/it] + 56%|█████▌ | 4143/7378 [14:12:50<11:06:42, 12.37s/it] + +{'loss': 0.44, 'learning_rate': 8.499862224912617e-06, 'epoch': 0.56} + + 56%|█████▌ | 4143/7378 [14:12:50<11:06:42, 12.37s/it] + 56%|█████▌ | 4144/7378 [14:13:02<11:02:23, 12.29s/it] + +{'loss': 0.4675, 'learning_rate': 8.49552189714999e-06, 'epoch': 0.56} + + 56%|█████▌ | 4144/7378 [14:13:02<11:02:23, 12.29s/it] + 56%|█████▌ | 4145/7378 [14:13:15<11:06:26, 12.37s/it] + +{'loss': 0.3964, 'learning_rate': 8.491181859351941e-06, 'epoch': 0.56} + + 56%|█████▌ | 4145/7378 [14:13:15<11:06:26, 12.37s/it] + 56%|█████▌ | 4146/7378 [14:13:27<11:03:06, 12.31s/it] + +{'loss': 0.4377, 'learning_rate': 8.48684211235494e-06, 'epoch': 0.56} + + 56%|█████▌ | 4146/7378 [14:13:27<11:03:06, 12.31s/it] + 56%|█████▌ | 4147/7378 [14:13:39<10:59:51, 12.25s/it] + +{'loss': 0.4183, 'learning_rate': 8.482502656995411e-06, 'epoch': 0.56} + + 56%|█████▌ | 4147/7378 [14:13:39<10:59:51, 12.25s/it] + 56%|█████▌ | 4148/7378 [14:13:51<11:02:03, 12.30s/it] + +{'loss': 0.4994, 'learning_rate': 8.47816349410971e-06, 'epoch': 0.56} + + 56%|█████▌ | 4148/7378 [14:13:51<11:02:03, 12.30s/it] + 56%|█████▌ | 4149/7378 [14:14:04<11:02:36, 12.31s/it] + +{'loss': 0.4178, 'learning_rate': 8.47382462453415e-06, 'epoch': 0.56} + + 56%|█████▌ | 4149/7378 [14:14:04<11:02:36, 12.31s/it] + 56%|█████▌ | 4150/7378 [14:14:16<11:01:31, 12.30s/it] + +{'loss': 0.5067, 'learning_rate': 8.469486049104972e-06, 'epoch': 0.56} + + 56%|█████▌ | 4150/7378 [14:14:16<11:01:31, 12.30s/it] + 56%|█████▋ | 4151/7378 [14:14:28<11:00:13, 12.28s/it] + +{'loss': 0.4297, 'learning_rate': 8.465147768658374e-06, 'epoch': 0.56} + + 56%|█████▋ | 4151/7378 [14:14:28<11:00:13, 12.28s/it] + 56%|█████▋ | 4152/7378 [14:14:41<11:04:24, 12.36s/it] + +{'loss': 0.4546, 'learning_rate': 8.460809784030491e-06, 'epoch': 0.56} + + 56%|█████▋ | 4152/7378 [14:14:41<11:04:24, 12.36s/it] + 56%|█████▋ | 4153/7378 [14:14:53<10:59:36, 12.27s/it] + +{'loss': 0.5349, 'learning_rate': 8.456472096057398e-06, 'epoch': 0.56} + + 56%|█████▋ | 4153/7378 [14:14:53<10:59:36, 12.27s/it] + 56%|█████▋ | 4154/7378 [14:15:05<10:58:38, 12.26s/it] + +{'loss': 0.4075, 'learning_rate': 8.452134705575121e-06, 'epoch': 0.56} + + 56%|█████▋ | 4154/7378 [14:15:05<10:58:38, 12.26s/it] + 56%|█████▋ | 4155/7378 [14:15:17<10:56:25, 12.22s/it] + +{'loss': 0.4724, 'learning_rate': 8.44779761341962e-06, 'epoch': 0.56} + + 56%|█████▋ | 4155/7378 [14:15:17<10:56:25, 12.22s/it] + 56%|█████▋ | 4156/7378 [14:15:29<10:48:30, 12.08s/it] + +{'loss': 0.388, 'learning_rate': 8.443460820426806e-06, 'epoch': 0.56} + + 56%|█████▋ | 4156/7378 [14:15:29<10:48:30, 12.08s/it] + 56%|█████▋ | 4157/7378 [14:15:41<10:50:52, 12.12s/it] + +{'loss': 0.4281, 'learning_rate': 8.439124327432521e-06, 'epoch': 0.56} + + 56%|█████▋ | 4157/7378 [14:15:41<10:50:52, 12.12s/it] + 56%|█████▋ | 4158/7378 [14:15:54<11:00:08, 12.30s/it] + +{'loss': 0.4841, 'learning_rate': 8.434788135272564e-06, 'epoch': 0.56} + + 56%|█████▋ | 4158/7378 [14:15:54<11:00:08, 12.30s/it] + 56%|█████▋ | 4159/7378 [14:16:06<10:59:48, 12.30s/it] + +{'loss': 0.499, 'learning_rate': 8.430452244782663e-06, 'epoch': 0.56} + + 56%|█████▋ | 4159/7378 [14:16:06<10:59:48, 12.30s/it] + 56%|█████▋ | 4160/7378 [14:16:19<11:04:52, 12.40s/it] + +{'loss': 0.423, 'learning_rate': 8.426116656798495e-06, 'epoch': 0.56} + + 56%|█████▋ | 4160/7378 [14:16:19<11:04:52, 12.40s/it] + 56%|█████▋ | 4161/7378 [14:16:31<11:04:32, 12.39s/it] + +{'loss': 0.4461, 'learning_rate': 8.421781372155675e-06, 'epoch': 0.56} + + 56%|█████▋ | 4161/7378 [14:16:31<11:04:32, 12.39s/it] + 56%|█████▋ | 4162/7378 [14:16:43<10:59:07, 12.30s/it] + +{'loss': 0.4095, 'learning_rate': 8.417446391689762e-06, 'epoch': 0.56} + + 56%|█████▋ | 4162/7378 [14:16:43<10:59:07, 12.30s/it] + 56%|█████▋ | 4163/7378 [14:16:56<10:59:22, 12.31s/it] + +{'loss': 0.5226, 'learning_rate': 8.413111716236257e-06, 'epoch': 0.56} + + 56%|█████▋ | 4163/7378 [14:16:56<10:59:22, 12.31s/it] + 56%|█████▋ | 4164/7378 [14:17:08<10:55:27, 12.24s/it] + +{'loss': 0.4742, 'learning_rate': 8.408777346630597e-06, 'epoch': 0.56} + + 56%|█████▋ | 4164/7378 [14:17:08<10:55:27, 12.24s/it] + 56%|█████▋ | 4165/7378 [14:17:20<10:54:46, 12.23s/it] + +{'loss': 0.4592, 'learning_rate': 8.40444328370817e-06, 'epoch': 0.56} + + 56%|█████▋ | 4165/7378 [14:17:20<10:54:46, 12.23s/it] + 56%|█████▋ | 4166/7378 [14:17:32<10:57:44, 12.29s/it] + +{'loss': 0.5015, 'learning_rate': 8.400109528304292e-06, 'epoch': 0.56} + + 56%|█████▋ | 4166/7378 [14:17:32<10:57:44, 12.29s/it] + 56%|█████▋ | 4167/7378 [14:17:45<10:56:48, 12.27s/it] + +{'loss': 0.3997, 'learning_rate': 8.39577608125423e-06, 'epoch': 0.56} + + 56%|█████▋ | 4167/7378 [14:17:45<10:56:48, 12.27s/it] + 56%|█████▋ | 4168/7378 [14:17:57<11:03:04, 12.39s/it] + +{'loss': 0.5346, 'learning_rate': 8.391442943393188e-06, 'epoch': 0.56} + + 56%|█████▋ | 4168/7378 [14:17:57<11:03:04, 12.39s/it] + 57%|█████▋ | 4169/7378 [14:18:09<10:56:22, 12.27s/it] + +{'loss': 0.4581, 'learning_rate': 8.387110115556311e-06, 'epoch': 0.57} + + 57%|█████▋ | 4169/7378 [14:18:09<10:56:22, 12.27s/it] + 57%|█████▋ | 4170/7378 [14:18:21<10:55:37, 12.26s/it] + +{'loss': 0.4789, 'learning_rate': 8.382777598578683e-06, 'epoch': 0.57} + + 57%|█████▋ | 4170/7378 [14:18:21<10:55:37, 12.26s/it] + 57%|█████▋ | 4171/7378 [14:18:34<11:05:38, 12.45s/it] + +{'loss': 0.4481, 'learning_rate': 8.378445393295321e-06, 'epoch': 0.57} + + 57%|█████▋ | 4171/7378 [14:18:34<11:05:38, 12.45s/it] + 57%|█████▋ | 4172/7378 [14:18:46<10:59:01, 12.33s/it] + +{'loss': 0.4237, 'learning_rate': 8.374113500541202e-06, 'epoch': 0.57} + + 57%|█████▋ | 4172/7378 [14:18:46<10:59:01, 12.33s/it] + 57%|█████▋ | 4173/7378 [14:18:59<10:58:37, 12.33s/it] + +{'loss': 0.5088, 'learning_rate': 8.369781921151226e-06, 'epoch': 0.57} + + 57%|█████▋ | 4173/7378 [14:18:59<10:58:37, 12.33s/it] + 57%|█████▋ | 4174/7378 [14:19:11<10:57:19, 12.31s/it] + +{'loss': 0.4681, 'learning_rate': 8.365450655960236e-06, 'epoch': 0.57} + + 57%|█████▋ | 4174/7378 [14:19:11<10:57:19, 12.31s/it] + 57%|█████▋ | 4175/7378 [14:19:23<10:52:59, 12.23s/it] + +{'loss': 0.4489, 'learning_rate': 8.361119705803016e-06, 'epoch': 0.57} + + 57%|█████▋ | 4175/7378 [14:19:23<10:52:59, 12.23s/it] + 57%|█████▋ | 4176/7378 [14:19:36<10:58:58, 12.35s/it] + +{'loss': 0.4647, 'learning_rate': 8.356789071514288e-06, 'epoch': 0.57} + + 57%|█████▋ | 4176/7378 [14:19:36<10:58:58, 12.35s/it] + 57%|█████▋ | 4177/7378 [14:19:48<11:03:02, 12.43s/it] + +{'loss': 0.506, 'learning_rate': 8.352458753928716e-06, 'epoch': 0.57} + + 57%|█████▋ | 4177/7378 [14:19:48<11:03:02, 12.43s/it] + 57%|█████▋ | 4178/7378 [14:20:01<11:02:26, 12.42s/it] + +{'loss': 0.4757, 'learning_rate': 8.348128753880898e-06, 'epoch': 0.57} + + 57%|█████▋ | 4178/7378 [14:20:01<11:02:26, 12.42s/it] + 57%|█████▋ | 4179/7378 [14:20:13<11:00:39, 12.39s/it] + +{'loss': 0.423, 'learning_rate': 8.343799072205376e-06, 'epoch': 0.57} + + 57%|█████▋ | 4179/7378 [14:20:13<11:00:39, 12.39s/it] + 57%|█████▋ | 4180/7378 [14:20:25<11:01:49, 12.42s/it] + +{'loss': 0.4693, 'learning_rate': 8.33946970973663e-06, 'epoch': 0.57} + + 57%|█████▋ | 4180/7378 [14:20:25<11:01:49, 12.42s/it] + 57%|█████▋ | 4181/7378 [14:20:38<11:00:04, 12.39s/it] + +{'loss': 0.4995, 'learning_rate': 8.33514066730907e-06, 'epoch': 0.57} + + 57%|█████▋ | 4181/7378 [14:20:38<11:00:04, 12.39s/it] + 57%|█████▋ | 4182/7378 [14:20:50<10:57:05, 12.34s/it] + +{'loss': 0.4731, 'learning_rate': 8.330811945757056e-06, 'epoch': 0.57} + + 57%|█████▋ | 4182/7378 [14:20:50<10:57:05, 12.34s/it] + 57%|█████▋ | 4183/7378 [14:21:03<11:00:44, 12.41s/it] + +{'loss': 0.4558, 'learning_rate': 8.32648354591488e-06, 'epoch': 0.57} + + 57%|█████▋ | 4183/7378 [14:21:03<11:00:44, 12.41s/it] + 57%|█████▋ | 4184/7378 [14:21:15<10:55:07, 12.31s/it] + +{'loss': 0.5259, 'learning_rate': 8.322155468616777e-06, 'epoch': 0.57} + + 57%|█████▋ | 4184/7378 [14:21:15<10:55:07, 12.31s/it] + 57%|█████▋ | 4185/7378 [14:21:27<10:52:15, 12.26s/it] + +{'loss': 0.4217, 'learning_rate': 8.317827714696908e-06, 'epoch': 0.57} + + 57%|█████▋ | 4185/7378 [14:21:27<10:52:15, 12.26s/it] + 57%|█████▋ | 4186/7378 [14:21:39<10:52:45, 12.27s/it] + +{'loss': 0.4115, 'learning_rate': 8.313500284989388e-06, 'epoch': 0.57} + + 57%|█████▋ | 4186/7378 [14:21:39<10:52:45, 12.27s/it] + 57%|█████▋ | 4187/7378 [14:21:52<11:01:16, 12.43s/it] + +{'loss': 0.4636, 'learning_rate': 8.309173180328255e-06, 'epoch': 0.57} + + 57%|█████▋ | 4187/7378 [14:21:52<11:01:16, 12.43s/it] + 57%|█████▋ | 4188/7378 [14:22:04<10:56:14, 12.34s/it] + +{'loss': 0.4395, 'learning_rate': 8.304846401547496e-06, 'epoch': 0.57} + + 57%|█████▋ | 4188/7378 [14:22:04<10:56:14, 12.34s/it] + 57%|█████▋ | 4189/7378 [14:22:16<10:53:06, 12.29s/it] + +{'loss': 0.4323, 'learning_rate': 8.300519949481028e-06, 'epoch': 0.57} + + 57%|█████▋ | 4189/7378 [14:22:16<10:53:06, 12.29s/it] + 57%|█████▋ | 4190/7378 [14:22:28<10:52:10, 12.27s/it] + +{'loss': 0.4203, 'learning_rate': 8.296193824962702e-06, 'epoch': 0.57} + + 57%|█████▋ | 4190/7378 [14:22:28<10:52:10, 12.27s/it] + 57%|█████▋ | 4191/7378 [14:22:41<10:49:46, 12.23s/it] + +{'loss': 0.5232, 'learning_rate': 8.291868028826317e-06, 'epoch': 0.57} + + 57%|█████▋ | 4191/7378 [14:22:41<10:49:46, 12.23s/it] + 57%|█████▋ | 4192/7378 [14:22:53<10:49:57, 12.24s/it] + +{'loss': 0.4669, 'learning_rate': 8.2875425619056e-06, 'epoch': 0.57} + + 57%|█████▋ | 4192/7378 [14:22:53<10:49:57, 12.24s/it] + 57%|█████▋ | 4193/7378 [14:23:05<10:49:53, 12.24s/it] + +{'loss': 0.4171, 'learning_rate': 8.283217425034218e-06, 'epoch': 0.57} + + 57%|█████▋ | 4193/7378 [14:23:05<10:49:53, 12.24s/it] + 57%|█████▋ | 4194/7378 [14:23:17<10:49:36, 12.24s/it] + +{'loss': 0.5174, 'learning_rate': 8.27889261904577e-06, 'epoch': 0.57} + + 57%|█████▋ | 4194/7378 [14:23:17<10:49:36, 12.24s/it] + 57%|█████▋ | 4195/7378 [14:23:30<10:50:33, 12.26s/it] + +{'loss': 0.4858, 'learning_rate': 8.2745681447738e-06, 'epoch': 0.57} + + 57%|█████▋ | 4195/7378 [14:23:30<10:50:33, 12.26s/it] + 57%|█████▋ | 4196/7378 [14:23:42<10:57:46, 12.40s/it] + +{'loss': 0.4706, 'learning_rate': 8.27024400305178e-06, 'epoch': 0.57} + + 57%|█████▋ | 4196/7378 [14:23:42<10:57:46, 12.40s/it] + 57%|█████▋ | 4197/7378 [14:23:55<10:55:39, 12.37s/it] + +{'loss': 0.5131, 'learning_rate': 8.265920194713116e-06, 'epoch': 0.57} + + 57%|█████▋ | 4197/7378 [14:23:55<10:55:39, 12.37s/it] + 57%|█████▋ | 4198/7378 [14:24:07<10:53:01, 12.32s/it] + +{'loss': 0.4647, 'learning_rate': 8.261596720591164e-06, 'epoch': 0.57} + + 57%|█████▋ | 4198/7378 [14:24:07<10:53:01, 12.32s/it] + 57%|█████▋ | 4199/7378 [14:24:19<10:48:19, 12.24s/it] + +{'loss': 0.4976, 'learning_rate': 8.257273581519193e-06, 'epoch': 0.57} + + 57%|█████▋ | 4199/7378 [14:24:19<10:48:19, 12.24s/it] + 57%|█████▋ | 4200/7378 [14:24:32<10:54:04, 12.35s/it] + +{'loss': 0.4049, 'learning_rate': 8.252950778330434e-06, 'epoch': 0.57} + + 57%|█████▋ | 4200/7378 [14:24:32<10:54:04, 12.35s/it] + 57%|█████▋ | 4201/7378 [14:24:44<10:52:47, 12.33s/it] + +{'loss': 0.4438, 'learning_rate': 8.24862831185803e-06, 'epoch': 0.57} + + 57%|█████▋ | 4201/7378 [14:24:44<10:52:47, 12.33s/it] + 57%|█████▋ | 4202/7378 [14:24:56<10:48:27, 12.25s/it] + +{'loss': 0.4108, 'learning_rate': 8.244306182935074e-06, 'epoch': 0.57} + + 57%|█████▋ | 4202/7378 [14:24:56<10:48:27, 12.25s/it] + 57%|█████▋ | 4203/7378 [14:25:08<10:45:22, 12.20s/it] + +{'loss': 0.4625, 'learning_rate': 8.239984392394584e-06, 'epoch': 0.57} + + 57%|█████▋ | 4203/7378 [14:25:08<10:45:22, 12.20s/it] + 57%|█████▋ | 4204/7378 [14:25:20<10:45:52, 12.21s/it] + +{'loss': 0.4198, 'learning_rate': 8.235662941069523e-06, 'epoch': 0.57} + + 57%|█████▋ | 4204/7378 [14:25:20<10:45:52, 12.21s/it] + 57%|█████▋ | 4205/7378 [14:25:33<10:49:14, 12.28s/it] + +{'loss': 0.4343, 'learning_rate': 8.23134182979278e-06, 'epoch': 0.57} + + 57%|█████▋ | 4205/7378 [14:25:33<10:49:14, 12.28s/it] + 57%|█████▋ | 4206/7378 [14:25:45<10:53:09, 12.35s/it] + +{'loss': 0.5203, 'learning_rate': 8.22702105939718e-06, 'epoch': 0.57} + + 57%|█████▋ | 4206/7378 [14:25:45<10:53:09, 12.35s/it] + 57%|█████▋ | 4207/7378 [14:25:58<10:59:30, 12.48s/it] + +{'loss': 0.4467, 'learning_rate': 8.222700630715486e-06, 'epoch': 0.57} + + 57%|█████▋ | 4207/7378 [14:25:58<10:59:30, 12.48s/it] + 57%|█████▋ | 4208/7378 [14:26:10<10:53:16, 12.36s/it] + +{'loss': 0.4544, 'learning_rate': 8.218380544580388e-06, 'epoch': 0.57} + + 57%|█████▋ | 4208/7378 [14:26:10<10:53:16, 12.36s/it] + 57%|█████▋ | 4209/7378 [14:26:23<10:55:37, 12.41s/it] + +{'loss': 0.4708, 'learning_rate': 8.214060801824524e-06, 'epoch': 0.57} + + 57%|█████▋ | 4209/7378 [14:26:23<10:55:37, 12.41s/it] + 57%|█████▋ | 4210/7378 [14:26:35<10:56:39, 12.44s/it] + +{'loss': 0.5255, 'learning_rate': 8.209741403280449e-06, 'epoch': 0.57} + + 57%|█████▋ | 4210/7378 [14:26:35<10:56:39, 12.44s/it] + 57%|█████▋ | 4211/7378 [14:26:47<10:47:34, 12.27s/it] + +{'loss': 0.4425, 'learning_rate': 8.205422349780664e-06, 'epoch': 0.57} + + 57%|█████▋ | 4211/7378 [14:26:47<10:47:34, 12.27s/it] + 57%|█████▋ | 4212/7378 [14:26:59<10:42:49, 12.18s/it] + +{'loss': 0.4329, 'learning_rate': 8.201103642157597e-06, 'epoch': 0.57} + + 57%|█████▋ | 4212/7378 [14:26:59<10:42:49, 12.18s/it] + 57%|█████▋ | 4213/7378 [14:27:11<10:44:34, 12.22s/it] + +{'loss': 0.4977, 'learning_rate': 8.19678528124361e-06, 'epoch': 0.57} + + 57%|█████▋ | 4213/7378 [14:27:11<10:44:34, 12.22s/it] + 57%|█████▋ | 4214/7378 [14:27:24<10:50:04, 12.33s/it] + +{'loss': 0.4171, 'learning_rate': 8.192467267871003e-06, 'epoch': 0.57} + + 57%|█████▋ | 4214/7378 [14:27:24<10:50:04, 12.33s/it] + 57%|█████▋ | 4215/7378 [14:27:36<10:46:11, 12.26s/it] + +{'loss': 0.4772, 'learning_rate': 8.188149602871998e-06, 'epoch': 0.57} + + 57%|█████▋ | 4215/7378 [14:27:36<10:46:11, 12.26s/it] + 57%|█████▋ | 4216/7378 [14:27:48<10:42:49, 12.20s/it] + +{'loss': 0.4876, 'learning_rate': 8.183832287078763e-06, 'epoch': 0.57} + + 57%|█████▋ | 4216/7378 [14:27:48<10:42:49, 12.20s/it] + 57%|█████▋ | 4217/7378 [14:28:00<10:46:05, 12.26s/it] + +{'loss': 0.4917, 'learning_rate': 8.179515321323397e-06, 'epoch': 0.57} + + 57%|█████▋ | 4217/7378 [14:28:00<10:46:05, 12.26s/it] + 57%|█████▋ | 4218/7378 [14:28:13<10:48:53, 12.32s/it] + +{'loss': 0.4323, 'learning_rate': 8.175198706437918e-06, 'epoch': 0.57} + + 57%|█████▋ | 4218/7378 [14:28:13<10:48:53, 12.32s/it] + 57%|█████▋ | 4219/7378 [14:28:25<10:47:33, 12.30s/it] + +{'loss': 0.4212, 'learning_rate': 8.170882443254294e-06, 'epoch': 0.57} + + 57%|█████▋ | 4219/7378 [14:28:25<10:47:33, 12.30s/it] + 57%|█████▋ | 4220/7378 [14:28:38<10:50:32, 12.36s/it] + +{'loss': 0.434, 'learning_rate': 8.166566532604411e-06, 'epoch': 0.57} + + 57%|█████▋ | 4220/7378 [14:28:38<10:50:32, 12.36s/it] + 57%|█████▋ | 4221/7378 [14:28:50<10:50:02, 12.35s/it] + +{'loss': 0.5134, 'learning_rate': 8.1622509753201e-06, 'epoch': 0.57} + + 57%|█████▋ | 4221/7378 [14:28:50<10:50:02, 12.35s/it] + 57%|█████▋ | 4222/7378 [14:29:02<10:47:11, 12.30s/it] + +{'loss': 0.4833, 'learning_rate': 8.15793577223311e-06, 'epoch': 0.57} + + 57%|█████▋ | 4222/7378 [14:29:02<10:47:11, 12.30s/it] + 57%|█████▋ | 4223/7378 [14:29:14<10:47:56, 12.32s/it] + +{'loss': 0.4662, 'learning_rate': 8.153620924175132e-06, 'epoch': 0.57} + + 57%|█████▋ | 4223/7378 [14:29:14<10:47:56, 12.32s/it] + 57%|█████▋ | 4224/7378 [14:29:27<10:49:01, 12.35s/it] + +{'loss': 0.4897, 'learning_rate': 8.149306431977785e-06, 'epoch': 0.57} + + 57%|█████▋ | 4224/7378 [14:29:27<10:49:01, 12.35s/it] + 57%|█████▋ | 4225/7378 [14:29:39<10:45:28, 12.28s/it] + +{'loss': 0.4901, 'learning_rate': 8.14499229647262e-06, 'epoch': 0.57} + + 57%|█████▋ | 4225/7378 [14:29:39<10:45:28, 12.28s/it] + 57%|█████▋ | 4226/7378 [14:29:51<10:42:34, 12.23s/it] + +{'loss': 0.47, 'learning_rate': 8.140678518491118e-06, 'epoch': 0.57} + + 57%|█████▋ | 4226/7378 [14:29:51<10:42:34, 12.23s/it] + 57%|█████▋ | 4227/7378 [14:30:03<10:43:58, 12.26s/it] + +{'loss': 0.3962, 'learning_rate': 8.136365098864693e-06, 'epoch': 0.57} + + 57%|█████▋ | 4227/7378 [14:30:03<10:43:58, 12.26s/it] + 57%|█████▋ | 4228/7378 [14:30:16<10:44:12, 12.27s/it] + +{'loss': 0.4171, 'learning_rate': 8.132052038424689e-06, 'epoch': 0.57} + + 57%|█████▋ | 4228/7378 [14:30:16<10:44:12, 12.27s/it] + 57%|█████▋ | 4229/7378 [14:30:28<10:47:39, 12.34s/it] + +{'loss': 0.4054, 'learning_rate': 8.12773933800238e-06, 'epoch': 0.57} + + 57%|█████▋ | 4229/7378 [14:30:28<10:47:39, 12.34s/it] + 57%|█████▋ | 4230/7378 [14:30:41<10:48:11, 12.35s/it] + +{'loss': 0.4443, 'learning_rate': 8.123426998428974e-06, 'epoch': 0.57} + + 57%|█████▋ | 4230/7378 [14:30:41<10:48:11, 12.35s/it] + 57%|█████▋ | 4231/7378 [14:30:53<10:51:55, 12.43s/it] + +{'loss': 0.4678, 'learning_rate': 8.119115020535605e-06, 'epoch': 0.57} + + 57%|█████▋ | 4231/7378 [14:30:53<10:51:55, 12.43s/it] + 57%|█████▋ | 4232/7378 [14:31:05<10:45:46, 12.32s/it] + +{'loss': 0.4209, 'learning_rate': 8.114803405153337e-06, 'epoch': 0.57} + + 57%|█████▋ | 4232/7378 [14:31:05<10:45:46, 12.32s/it] + 57%|█████▋ | 4233/7378 [14:31:18<10:45:06, 12.31s/it] + +{'loss': 0.4726, 'learning_rate': 8.11049215311317e-06, 'epoch': 0.57} + + 57%|█████▋ | 4233/7378 [14:31:18<10:45:06, 12.31s/it] + 57%|█████▋ | 4234/7378 [14:31:30<10:40:55, 12.23s/it] + +{'loss': 0.4393, 'learning_rate': 8.106181265246026e-06, 'epoch': 0.57} + + 57%|█████▋ | 4234/7378 [14:31:30<10:40:55, 12.23s/it] + 57%|█████▋ | 4235/7378 [14:31:42<10:44:39, 12.31s/it] + +{'loss': 0.4719, 'learning_rate': 8.101870742382768e-06, 'epoch': 0.57} + + 57%|█████▋ | 4235/7378 [14:31:42<10:44:39, 12.31s/it] + 57%|█████▋ | 4236/7378 [14:31:54<10:38:52, 12.20s/it] + +{'loss': 0.4431, 'learning_rate': 8.097560585354176e-06, 'epoch': 0.57} + + 57%|█████▋ | 4236/7378 [14:31:54<10:38:52, 12.20s/it] + 57%|█████▋ | 4237/7378 [14:32:07<10:43:52, 12.30s/it] + +{'loss': 0.4737, 'learning_rate': 8.093250794990966e-06, 'epoch': 0.57} + + 57%|█████▋ | 4237/7378 [14:32:07<10:43:52, 12.30s/it] + 57%|█████▋ | 4238/7378 [14:32:19<10:44:17, 12.31s/it] + +{'loss': 0.4228, 'learning_rate': 8.088941372123781e-06, 'epoch': 0.57} + + 57%|█████▋ | 4238/7378 [14:32:19<10:44:17, 12.31s/it] + 57%|█████▋ | 4239/7378 [14:32:31<10:38:34, 12.21s/it] + +{'loss': 0.4801, 'learning_rate': 8.0846323175832e-06, 'epoch': 0.57} + + 57%|█████▋ | 4239/7378 [14:32:31<10:38:34, 12.21s/it] + 57%|█████▋ | 4240/7378 [14:32:43<10:40:28, 12.25s/it] + +{'loss': 0.4488, 'learning_rate': 8.080323632199724e-06, 'epoch': 0.57} + + 57%|█████▋ | 4240/7378 [14:32:43<10:40:28, 12.25s/it] + 57%|█████▋ | 4241/7378 [14:32:57<10:56:56, 12.57s/it] + +{'loss': 0.4545, 'learning_rate': 8.076015316803781e-06, 'epoch': 0.57} + + 57%|█████▋ | 4241/7378 [14:32:57<10:56:56, 12.57s/it] + 57%|█████▋ | 4242/7378 [14:33:09<10:53:45, 12.51s/it] + +{'loss': 0.4794, 'learning_rate': 8.071707372225734e-06, 'epoch': 0.57} + + 57%|█████▋ | 4242/7378 [14:33:09<10:53:45, 12.51s/it] + 58%|█████▊ | 4243/7378 [14:33:21<10:41:55, 12.29s/it] + +{'loss': 0.4618, 'learning_rate': 8.067399799295872e-06, 'epoch': 0.58} + + 58%|█████▊ | 4243/7378 [14:33:21<10:41:55, 12.29s/it] + 58%|█████▊ | 4244/7378 [14:33:33<10:42:00, 12.29s/it] + +{'loss': 0.4415, 'learning_rate': 8.06309259884441e-06, 'epoch': 0.58} + + 58%|█████▊ | 4244/7378 [14:33:33<10:42:00, 12.29s/it] + 58%|█████▊ | 4245/7378 [14:33:45<10:38:59, 12.24s/it] + +{'loss': 0.531, 'learning_rate': 8.058785771701497e-06, 'epoch': 0.58} + + 58%|█████▊ | 4245/7378 [14:33:45<10:38:59, 12.24s/it] + 58%|█████▊ | 4246/7378 [14:33:58<10:46:15, 12.38s/it] + +{'loss': 0.4137, 'learning_rate': 8.054479318697203e-06, 'epoch': 0.58} + + 58%|█████▊ | 4246/7378 [14:33:58<10:46:15, 12.38s/it] + 58%|█████▊ | 4247/7378 [14:34:10<10:46:19, 12.39s/it] + +{'loss': 0.4805, 'learning_rate': 8.050173240661533e-06, 'epoch': 0.58} + + 58%|█████▊ | 4247/7378 [14:34:10<10:46:19, 12.39s/it] + 58%|█████▊ | 4248/7378 [14:34:23<10:48:01, 12.42s/it] + +{'loss': 0.4847, 'learning_rate': 8.04586753842441e-06, 'epoch': 0.58} + + 58%|█████▊ | 4248/7378 [14:34:23<10:48:01, 12.42s/it] + 58%|█████▊ | 4249/7378 [14:34:35<10:49:44, 12.46s/it] + +{'loss': 0.5065, 'learning_rate': 8.041562212815699e-06, 'epoch': 0.58} + + 58%|█████▊ | 4249/7378 [14:34:35<10:49:44, 12.46s/it] + 58%|█████▊ | 4250/7378 [14:34:48<10:52:35, 12.52s/it] + +{'loss': 0.4741, 'learning_rate': 8.037257264665174e-06, 'epoch': 0.58} + + 58%|█████▊ | 4250/7378 [14:34:48<10:52:35, 12.52s/it] + 58%|█████▊ | 4251/7378 [14:35:00<10:48:37, 12.45s/it] + +{'loss': 0.4621, 'learning_rate': 8.032952694802556e-06, 'epoch': 0.58} + + 58%|█████▊ | 4251/7378 [14:35:00<10:48:37, 12.45s/it] + 58%|█████▊ | 4252/7378 [14:35:12<10:44:00, 12.36s/it] + +{'loss': 0.4126, 'learning_rate': 8.028648504057477e-06, 'epoch': 0.58} + + 58%|█████▊ | 4252/7378 [14:35:12<10:44:00, 12.36s/it] + 58%|█████▊ | 4253/7378 [14:35:25<10:50:08, 12.48s/it] + +{'loss': 0.475, 'learning_rate': 8.024344693259505e-06, 'epoch': 0.58} + + 58%|█████▊ | 4253/7378 [14:35:25<10:50:08, 12.48s/it] + 58%|█████▊ | 4254/7378 [14:35:37<10:48:11, 12.45s/it] + +{'loss': 0.4591, 'learning_rate': 8.02004126323813e-06, 'epoch': 0.58} + + 58%|█████▊ | 4254/7378 [14:35:37<10:48:11, 12.45s/it] + 58%|█████▊ | 4255/7378 [14:35:50<10:51:14, 12.51s/it] + +{'loss': 0.4638, 'learning_rate': 8.015738214822774e-06, 'epoch': 0.58} + + 58%|█████▊ | 4255/7378 [14:35:50<10:51:14, 12.51s/it] + 58%|█████▊ | 4256/7378 [14:36:03<10:57:39, 12.64s/it] + +{'loss': 0.4594, 'learning_rate': 8.011435548842782e-06, 'epoch': 0.58} + + 58%|█████▊ | 4256/7378 [14:36:03<10:57:39, 12.64s/it] + 58%|█████▊ | 4257/7378 [14:36:15<10:49:48, 12.49s/it] + +{'loss': 0.4296, 'learning_rate': 8.00713326612742e-06, 'epoch': 0.58} + + 58%|█████▊ | 4257/7378 [14:36:15<10:49:48, 12.49s/it] + 58%|█████▊ | 4258/7378 [14:36:28<10:46:41, 12.44s/it] + +{'loss': 0.4535, 'learning_rate': 8.002831367505892e-06, 'epoch': 0.58} + + 58%|█████▊ | 4258/7378 [14:36:28<10:46:41, 12.44s/it] + 58%|█████▊ | 4259/7378 [14:36:40<10:43:05, 12.37s/it] + +{'loss': 0.4722, 'learning_rate': 7.998529853807316e-06, 'epoch': 0.58} + + 58%|█████▊ | 4259/7378 [14:36:40<10:43:05, 12.37s/it] + 58%|█████▊ | 4260/7378 [14:36:52<10:38:49, 12.29s/it] + +{'loss': 0.4534, 'learning_rate': 7.994228725860744e-06, 'epoch': 0.58} + + 58%|█████▊ | 4260/7378 [14:36:52<10:38:49, 12.29s/it] + 58%|█████▊ | 4261/7378 [14:37:04<10:33:50, 12.20s/it] + +{'loss': 0.3961, 'learning_rate': 7.989927984495155e-06, 'epoch': 0.58} + + 58%|█████▊ | 4261/7378 [14:37:04<10:33:50, 12.20s/it] + 58%|█████▊ | 4262/7378 [14:37:16<10:38:29, 12.29s/it] + +{'loss': 0.4718, 'learning_rate': 7.985627630539443e-06, 'epoch': 0.58} + + 58%|█████▊ | 4262/7378 [14:37:16<10:38:29, 12.29s/it] + 58%|█████▊ | 4263/7378 [14:37:29<10:46:18, 12.45s/it] + +{'loss': 0.4522, 'learning_rate': 7.981327664822438e-06, 'epoch': 0.58} + + 58%|█████▊ | 4263/7378 [14:37:29<10:46:18, 12.45s/it] + 58%|█████▊ | 4264/7378 [14:37:42<10:51:41, 12.56s/it] + +{'loss': 0.4959, 'learning_rate': 7.977028088172889e-06, 'epoch': 0.58} + + 58%|█████▊ | 4264/7378 [14:37:42<10:51:41, 12.56s/it] + 58%|█████▊ | 4265/7378 [14:37:54<10:46:56, 12.47s/it] + +{'loss': 0.4904, 'learning_rate': 7.972728901419475e-06, 'epoch': 0.58} + + 58%|█████▊ | 4265/7378 [14:37:54<10:46:56, 12.47s/it] + 58%|█████▊ | 4266/7378 [14:38:07<10:43:55, 12.42s/it] + +{'loss': 0.4918, 'learning_rate': 7.968430105390792e-06, 'epoch': 0.58} + + 58%|█████▊ | 4266/7378 [14:38:07<10:43:55, 12.42s/it] + 58%|█████▊ | 4267/7378 [14:38:19<10:39:43, 12.34s/it] + +{'loss': 0.5048, 'learning_rate': 7.964131700915368e-06, 'epoch': 0.58} + + 58%|█████▊ | 4267/7378 [14:38:19<10:39:43, 12.34s/it] + 58%|█████▊ | 4268/7378 [14:38:31<10:41:50, 12.38s/it] + +{'loss': 0.4673, 'learning_rate': 7.959833688821655e-06, 'epoch': 0.58} + + 58%|█████▊ | 4268/7378 [14:38:31<10:41:50, 12.38s/it] + 58%|█████▊ | 4269/7378 [14:38:43<10:35:32, 12.27s/it] + +{'loss': 0.4287, 'learning_rate': 7.955536069938022e-06, 'epoch': 0.58} + + 58%|█████▊ | 4269/7378 [14:38:43<10:35:32, 12.27s/it] + 58%|█████▊ | 4270/7378 [14:38:56<10:37:49, 12.31s/it] + +{'loss': 0.4384, 'learning_rate': 7.951238845092776e-06, 'epoch': 0.58} + + 58%|█████▊ | 4270/7378 [14:38:56<10:37:49, 12.31s/it] + 58%|█████▊ | 4271/7378 [14:39:08<10:36:44, 12.30s/it] + +{'loss': 0.4448, 'learning_rate': 7.94694201511413e-06, 'epoch': 0.58} + + 58%|█████▊ | 4271/7378 [14:39:08<10:36:44, 12.30s/it] + 58%|█████▊ | 4272/7378 [14:39:20<10:39:48, 12.36s/it] + +{'loss': 0.4282, 'learning_rate': 7.94264558083024e-06, 'epoch': 0.58} + + 58%|█████▊ | 4272/7378 [14:39:20<10:39:48, 12.36s/it] + 58%|█████▊ | 4273/7378 [14:39:32<10:35:41, 12.28s/it] + +{'loss': 0.4812, 'learning_rate': 7.93834954306917e-06, 'epoch': 0.58} + + 58%|█████▊ | 4273/7378 [14:39:32<10:35:41, 12.28s/it] + 58%|█████▊ | 4274/7378 [14:39:45<10:38:31, 12.34s/it] + +{'loss': 0.4607, 'learning_rate': 7.934053902658918e-06, 'epoch': 0.58} + + 58%|█████▊ | 4274/7378 [14:39:45<10:38:31, 12.34s/it] + 58%|█████▊ | 4275/7378 [14:39:58<10:46:54, 12.51s/it] + +{'loss': 0.4128, 'learning_rate': 7.929758660427398e-06, 'epoch': 0.58} + + 58%|█████▊ | 4275/7378 [14:39:58<10:46:54, 12.51s/it] + 58%|█████▊ | 4276/7378 [14:40:10<10:45:20, 12.48s/it] + +{'loss': 0.5246, 'learning_rate': 7.92546381720245e-06, 'epoch': 0.58} + + 58%|█████▊ | 4276/7378 [14:40:10<10:45:20, 12.48s/it] + 58%|█████▊ | 4277/7378 [14:40:23<10:42:20, 12.43s/it] + +{'loss': 0.4936, 'learning_rate': 7.921169373811843e-06, 'epoch': 0.58} + + 58%|█████▊ | 4277/7378 [14:40:23<10:42:20, 12.43s/it] + 58%|█████▊ | 4278/7378 [14:40:35<10:36:50, 12.33s/it] + +{'loss': 0.4413, 'learning_rate': 7.916875331083258e-06, 'epoch': 0.58} + + 58%|█████▊ | 4278/7378 [14:40:35<10:36:50, 12.33s/it] + 58%|█████▊ | 4279/7378 [14:40:47<10:35:38, 12.31s/it] + +{'loss': 0.4511, 'learning_rate': 7.912581689844309e-06, 'epoch': 0.58} + + 58%|█████▊ | 4279/7378 [14:40:47<10:35:38, 12.31s/it] + 58%|█████▊ | 4280/7378 [14:40:59<10:33:58, 12.28s/it] + +{'loss': 0.512, 'learning_rate': 7.908288450922523e-06, 'epoch': 0.58} + + 58%|█████▊ | 4280/7378 [14:40:59<10:33:58, 12.28s/it] + 58%|█████▊ | 4281/7378 [14:41:11<10:34:38, 12.30s/it] + +{'loss': 0.424, 'learning_rate': 7.903995615145361e-06, 'epoch': 0.58} + + 58%|█████▊ | 4281/7378 [14:41:11<10:34:38, 12.30s/it] + 58%|█████▊ | 4282/7378 [14:41:24<10:33:37, 12.28s/it] + +{'loss': 0.4329, 'learning_rate': 7.899703183340195e-06, 'epoch': 0.58} + + 58%|█████▊ | 4282/7378 [14:41:24<10:33:37, 12.28s/it] + 58%|█████▊ | 4283/7378 [14:41:36<10:30:04, 12.21s/it] + +{'loss': 0.4033, 'learning_rate': 7.895411156334322e-06, 'epoch': 0.58} + + 58%|█████▊ | 4283/7378 [14:41:36<10:30:04, 12.21s/it] + 58%|█████▊ | 4284/7378 [14:41:48<10:28:54, 12.20s/it] + +{'loss': 0.4542, 'learning_rate': 7.89111953495497e-06, 'epoch': 0.58} + + 58%|█████▊ | 4284/7378 [14:41:48<10:28:54, 12.20s/it] + 58%|█████▊ | 4285/7378 [14:42:01<10:40:25, 12.42s/it] + +{'loss': 0.4584, 'learning_rate': 7.886828320029277e-06, 'epoch': 0.58} + + 58%|█████▊ | 4285/7378 [14:42:01<10:40:25, 12.42s/it] + 58%|█████▊ | 4286/7378 [14:42:14<10:46:48, 12.55s/it] + +{'loss': 0.4936, 'learning_rate': 7.882537512384308e-06, 'epoch': 0.58} + + 58%|█████▊ | 4286/7378 [14:42:14<10:46:48, 12.55s/it] + 58%|█████▊ | 4287/7378 [14:42:26<10:41:08, 12.45s/it] + +{'loss': 0.4916, 'learning_rate': 7.878247112847049e-06, 'epoch': 0.58} + + 58%|█████▊ | 4287/7378 [14:42:26<10:41:08, 12.45s/it] + 58%|█████▊ | 4288/7378 [14:42:39<10:45:57, 12.54s/it] + +{'loss': 0.4171, 'learning_rate': 7.873957122244408e-06, 'epoch': 0.58} + + 58%|█████▊ | 4288/7378 [14:42:39<10:45:57, 12.54s/it] + 58%|█████▊ | 4289/7378 [14:42:51<10:42:34, 12.48s/it] + +{'loss': 0.5254, 'learning_rate': 7.869667541403212e-06, 'epoch': 0.58} + + 58%|█████▊ | 4289/7378 [14:42:51<10:42:34, 12.48s/it] + 58%|█████▊ | 4290/7378 [14:43:03<10:37:58, 12.40s/it] + +{'loss': 0.4848, 'learning_rate': 7.865378371150213e-06, 'epoch': 0.58} + + 58%|█████▊ | 4290/7378 [14:43:03<10:37:58, 12.40s/it] + 58%|█████▊ | 4291/7378 [14:43:16<10:39:00, 12.42s/it] + +{'loss': 0.4645, 'learning_rate': 7.86108961231208e-06, 'epoch': 0.58} + + 58%|█████▊ | 4291/7378 [14:43:16<10:39:00, 12.42s/it] + 58%|█████▊ | 4292/7378 [14:43:28<10:35:18, 12.35s/it] + +{'loss': 0.4857, 'learning_rate': 7.856801265715401e-06, 'epoch': 0.58} + + 58%|█████▊ | 4292/7378 [14:43:28<10:35:18, 12.35s/it] + 58%|█████▊ | 4293/7378 [14:43:40<10:34:24, 12.34s/it] + +{'loss': 0.5223, 'learning_rate': 7.852513332186695e-06, 'epoch': 0.58} + + 58%|█████▊ | 4293/7378 [14:43:40<10:34:24, 12.34s/it] + 58%|█████▊ | 4294/7378 [14:43:52<10:25:16, 12.16s/it] + +{'loss': 0.4861, 'learning_rate': 7.848225812552385e-06, 'epoch': 0.58} + + 58%|█████▊ | 4294/7378 [14:43:52<10:25:16, 12.16s/it] + 58%|█████▊ | 4295/7378 [14:44:04<10:30:15, 12.27s/it] + +{'loss': 0.4304, 'learning_rate': 7.843938707638831e-06, 'epoch': 0.58} + + 58%|█████▊ | 4295/7378 [14:44:04<10:30:15, 12.27s/it] + 58%|█████▊ | 4296/7378 [14:44:17<10:33:57, 12.34s/it] + +{'loss': 0.4096, 'learning_rate': 7.839652018272299e-06, 'epoch': 0.58} + + 58%|█████▊ | 4296/7378 [14:44:17<10:33:57, 12.34s/it] + 58%|█████▊ | 4297/7378 [14:44:29<10:34:02, 12.35s/it] + +{'loss': 0.4282, 'learning_rate': 7.835365745278987e-06, 'epoch': 0.58} + + 58%|█████▊ | 4297/7378 [14:44:29<10:34:02, 12.35s/it] + 58%|█████▊ | 4298/7378 [14:44:41<10:29:39, 12.27s/it] + +{'loss': 0.4538, 'learning_rate': 7.831079889485001e-06, 'epoch': 0.58} + + 58%|█████▊ | 4298/7378 [14:44:41<10:29:39, 12.27s/it] + 58%|█████▊ | 4299/7378 [14:44:54<10:29:56, 12.28s/it] + +{'loss': 0.4177, 'learning_rate': 7.826794451716379e-06, 'epoch': 0.58} + + 58%|█████▊ | 4299/7378 [14:44:54<10:29:56, 12.28s/it] + 58%|█████▊ | 4300/7378 [14:45:06<10:25:06, 12.19s/it] + +{'loss': 0.5135, 'learning_rate': 7.822509432799068e-06, 'epoch': 0.58} + + 58%|█████▊ | 4300/7378 [14:45:06<10:25:06, 12.19s/it] + 58%|█████▊ | 4301/7378 [14:45:18<10:21:28, 12.12s/it] + +{'loss': 0.4222, 'learning_rate': 7.818224833558936e-06, 'epoch': 0.58} + + 58%|█████▊ | 4301/7378 [14:45:18<10:21:28, 12.12s/it] + 58%|█████▊ | 4302/7378 [14:45:30<10:25:21, 12.20s/it] + +{'loss': 0.4457, 'learning_rate': 7.813940654821774e-06, 'epoch': 0.58} + + 58%|█████▊ | 4302/7378 [14:45:30<10:25:21, 12.20s/it] + 58%|█████▊ | 4303/7378 [14:45:43<10:30:16, 12.30s/it] + +{'loss': 0.4962, 'learning_rate': 7.809656897413295e-06, 'epoch': 0.58} + + 58%|█████▊ | 4303/7378 [14:45:43<10:30:16, 12.30s/it] + 58%|█████▊ | 4304/7378 [14:45:56<10:41:42, 12.53s/it] + +{'loss': 0.5055, 'learning_rate': 7.805373562159122e-06, 'epoch': 0.58} + + 58%|█████▊ | 4304/7378 [14:45:56<10:41:42, 12.53s/it] + 58%|█████▊ | 4305/7378 [14:46:08<10:44:01, 12.57s/it] + +{'loss': 0.5165, 'learning_rate': 7.801090649884802e-06, 'epoch': 0.58} + + 58%|█████▊ | 4305/7378 [14:46:08<10:44:01, 12.57s/it] + 58%|█████▊ | 4306/7378 [14:46:20<10:34:31, 12.39s/it] + +{'loss': 0.4337, 'learning_rate': 7.796808161415797e-06, 'epoch': 0.58} + + 58%|█████��� | 4306/7378 [14:46:20<10:34:31, 12.39s/it] + 58%|█████▊ | 4307/7378 [14:46:32<10:30:10, 12.31s/it] + +{'loss': 0.4436, 'learning_rate': 7.792526097577494e-06, 'epoch': 0.58} + + 58%|█████▊ | 4307/7378 [14:46:32<10:30:10, 12.31s/it] + 58%|█████▊ | 4308/7378 [14:46:44<10:23:20, 12.18s/it] + +{'loss': 0.4483, 'learning_rate': 7.788244459195192e-06, 'epoch': 0.58} + + 58%|█████▊ | 4308/7378 [14:46:44<10:23:20, 12.18s/it] + 58%|█████▊ | 4309/7378 [14:46:57<10:27:58, 12.28s/it] + +{'loss': 0.4541, 'learning_rate': 7.783963247094103e-06, 'epoch': 0.58} + + 58%|█████▊ | 4309/7378 [14:46:57<10:27:58, 12.28s/it] + 58%|█████▊ | 4310/7378 [14:47:10<10:36:14, 12.44s/it] + +{'loss': 0.4047, 'learning_rate': 7.779682462099373e-06, 'epoch': 0.58} + + 58%|█████▊ | 4310/7378 [14:47:10<10:36:14, 12.44s/it] + 58%|█████▊ | 4311/7378 [14:47:22<10:28:48, 12.30s/it] + +{'loss': 0.4381, 'learning_rate': 7.77540210503605e-06, 'epoch': 0.58} + + 58%|█████▊ | 4311/7378 [14:47:22<10:28:48, 12.30s/it] + 58%|█████▊ | 4312/7378 [14:47:34<10:27:56, 12.29s/it] + +{'loss': 0.4702, 'learning_rate': 7.77112217672911e-06, 'epoch': 0.58} + + 58%|█████▊ | 4312/7378 [14:47:34<10:27:56, 12.29s/it] + 58%|█████▊ | 4313/7378 [14:47:46<10:29:28, 12.32s/it] + +{'loss': 0.4834, 'learning_rate': 7.766842678003438e-06, 'epoch': 0.58} + + 58%|█████▊ | 4313/7378 [14:47:46<10:29:28, 12.32s/it] + 58%|█████▊ | 4314/7378 [14:47:58<10:25:58, 12.26s/it] + +{'loss': 0.3975, 'learning_rate': 7.762563609683846e-06, 'epoch': 0.58} + + 58%|█████▊ | 4314/7378 [14:47:58<10:25:58, 12.26s/it] + 58%|█████▊ | 4315/7378 [14:48:10<10:20:42, 12.16s/it] + +{'loss': 0.4367, 'learning_rate': 7.758284972595049e-06, 'epoch': 0.58} + + 58%|█████▊ | 4315/7378 [14:48:10<10:20:42, 12.16s/it] + 58%|█████▊ | 4316/7378 [14:48:23<10:33:30, 12.41s/it] + +{'loss': 0.4748, 'learning_rate': 7.754006767561696e-06, 'epoch': 0.58} + + 58%|█████▊ | 4316/7378 [14:48:23<10:33:30, 12.41s/it] + 59%|█████▊ | 4317/7378 [14:48:37<10:48:44, 12.72s/it] + +{'loss': 0.499, 'learning_rate': 7.74972899540834e-06, 'epoch': 0.59} + + 59%|█████▊ | 4317/7378 [14:48:37<10:48:44, 12.72s/it] + 59%|█████▊ | 4318/7378 [14:48:49<10:40:31, 12.56s/it] + +{'loss': 0.4391, 'learning_rate': 7.745451656959452e-06, 'epoch': 0.59} + + 59%|█████▊ | 4318/7378 [14:48:49<10:40:31, 12.56s/it] + 59%|█████▊ | 4319/7378 [14:49:01<10:32:51, 12.41s/it] + +{'loss': 0.4603, 'learning_rate': 7.741174753039426e-06, 'epoch': 0.59} + + 59%|█████▊ | 4319/7378 [14:49:01<10:32:51, 12.41s/it] + 59%|█████▊ | 4320/7378 [14:49:14<10:34:53, 12.46s/it] + +{'loss': 0.4196, 'learning_rate': 7.736898284472566e-06, 'epoch': 0.59} + + 59%|█████▊ | 4320/7378 [14:49:14<10:34:53, 12.46s/it] + 59%|█████▊ | 4321/7378 [14:49:26<10:37:26, 12.51s/it] + +{'loss': 0.5163, 'learning_rate': 7.732622252083097e-06, 'epoch': 0.59} + + 59%|█████▊ | 4321/7378 [14:49:26<10:37:26, 12.51s/it] + 59%|█████▊ | 4322/7378 [14:49:38<10:29:18, 12.36s/it] + +{'loss': 0.4702, 'learning_rate': 7.728346656695151e-06, 'epoch': 0.59} + + 59%|█████▊ | 4322/7378 [14:49:38<10:29:18, 12.36s/it] + 59%|█████▊ | 4323/7378 [14:49:50<10:25:23, 12.28s/it] + +{'loss': 0.403, 'learning_rate': 7.72407149913279e-06, 'epoch': 0.59} + + 59%|█████▊ | 4323/7378 [14:49:50<10:25:23, 12.28s/it] + 59%|█████▊ | 4324/7378 [14:50:02<10:24:00, 12.26s/it] + +{'loss': 0.4538, 'learning_rate': 7.719796780219979e-06, 'epoch': 0.59} + + 59%|█████▊ | 4324/7378 [14:50:02<10:24:00, 12.26s/it] + 59%|█████▊ | 4325/7378 [14:50:15<10:28:14, 12.35s/it] + +{'loss': 0.4642, 'learning_rate': 7.715522500780604e-06, 'epoch': 0.59} + + 59%|█████▊ | 4325/7378 [14:50:15<10:28:14, 12.35s/it] + 59%|█████▊ | 4326/7378 [14:50:27<10:29:28, 12.37s/it] + +{'loss': 0.5168, 'learning_rate': 7.711248661638467e-06, 'epoch': 0.59} + + 59%|█████▊ | 4326/7378 [14:50:27<10:29:28, 12.37s/it] + 59%|█████▊ | 4327/7378 [14:50:40<10:27:53, 12.35s/it] + +{'loss': 0.4833, 'learning_rate': 7.70697526361728e-06, 'epoch': 0.59} + + 59%|█████▊ | 4327/7378 [14:50:40<10:27:53, 12.35s/it] + 59%|█████▊ | 4328/7378 [14:50:52<10:26:03, 12.32s/it] + +{'loss': 0.5088, 'learning_rate': 7.702702307540675e-06, 'epoch': 0.59} + + 59%|█████▊ | 4328/7378 [14:50:52<10:26:03, 12.32s/it] + 59%|█████▊ | 4329/7378 [14:51:05<10:39:37, 12.59s/it] + +{'loss': 0.4427, 'learning_rate': 7.698429794232196e-06, 'epoch': 0.59} + + 59%|█████▊ | 4329/7378 [14:51:05<10:39:37, 12.59s/it] + 59%|█████▊ | 4330/7378 [14:51:18<10:37:11, 12.54s/it] + +{'loss': 0.4372, 'learning_rate': 7.694157724515309e-06, 'epoch': 0.59} + + 59%|█████▊ | 4330/7378 [14:51:18<10:37:11, 12.54s/it] + 59%|█████▊ | 4331/7378 [14:51:30<10:32:29, 12.45s/it] + +{'loss': 0.4409, 'learning_rate': 7.68988609921338e-06, 'epoch': 0.59} + + 59%|█████▊ | 4331/7378 [14:51:30<10:32:29, 12.45s/it] + 59%|█████▊ | 4332/7378 [14:51:42<10:30:36, 12.42s/it] + +{'loss': 0.4445, 'learning_rate': 7.685614919149705e-06, 'epoch': 0.59} + + 59%|█████▊ | 4332/7378 [14:51:42<10:30:36, 12.42s/it] + 59%|█████▊ | 4333/7378 [14:51:54<10:24:05, 12.30s/it] + +{'loss': 0.4608, 'learning_rate': 7.681344185147485e-06, 'epoch': 0.59} + + 59%|█████▊ | 4333/7378 [14:51:54<10:24:05, 12.30s/it] + 59%|█████▊ | 4334/7378 [14:52:07<10:26:33, 12.35s/it] + +{'loss': 0.4868, 'learning_rate': 7.677073898029832e-06, 'epoch': 0.59} + + 59%|█████▊ | 4334/7378 [14:52:07<10:26:33, 12.35s/it] + 59%|█████▉ | 4335/7378 [14:52:19<10:22:33, 12.28s/it] + +{'loss': 0.479, 'learning_rate': 7.672804058619784e-06, 'epoch': 0.59} + + 59%|█████▉ | 4335/7378 [14:52:19<10:22:33, 12.28s/it] + 59%|█████▉ | 4336/7378 [14:52:31<10:22:54, 12.29s/it] + +{'loss': 0.5187, 'learning_rate': 7.668534667740281e-06, 'epoch': 0.59} + + 59%|█████▉ | 4336/7378 [14:52:31<10:22:54, 12.29s/it] + 59%|█████▉ | 4337/7378 [14:52:44<10:24:40, 12.32s/it] + +{'loss': 0.5211, 'learning_rate': 7.664265726214183e-06, 'epoch': 0.59} + + 59%|█████▉ | 4337/7378 [14:52:44<10:24:40, 12.32s/it] + 59%|█████▉ | 4338/7378 [14:52:56<10:19:45, 12.23s/it] + +{'loss': 0.4394, 'learning_rate': 7.65999723486426e-06, 'epoch': 0.59} + + 59%|█████▉ | 4338/7378 [14:52:56<10:19:45, 12.23s/it] + 59%|█████▉ | 4339/7378 [14:53:08<10:18:50, 12.22s/it] + +{'loss': 0.4479, 'learning_rate': 7.655729194513201e-06, 'epoch': 0.59} + + 59%|█████▉ | 4339/7378 [14:53:08<10:18:50, 12.22s/it] + 59%|█████▉ | 4340/7378 [14:53:20<10:23:46, 12.32s/it] + +{'loss': 0.4529, 'learning_rate': 7.651461605983599e-06, 'epoch': 0.59} + + 59%|█████▉ | 4340/7378 [14:53:20<10:23:46, 12.32s/it] + 59%|█████▉ | 4341/7378 [14:53:32<10:20:32, 12.26s/it] + +{'loss': 0.3894, 'learning_rate': 7.64719447009797e-06, 'epoch': 0.59} + + 59%|█████▉ | 4341/7378 [14:53:32<10:20:32, 12.26s/it] + 59%|█████▉ | 4342/7378 [14:53:44<10:13:46, 12.13s/it] + +{'loss': 0.4694, 'learning_rate': 7.642927787678733e-06, 'epoch': 0.59} + + 59%|█████▉ | 4342/7378 [14:53:44<10:13:46, 12.13s/it] + 59%|█████▉ | 4343/7378 [14:53:57<10:15:19, 12.16s/it] + +{'loss': 0.4629, 'learning_rate': 7.638661559548222e-06, 'epoch': 0.59} + + 59%|█████▉ | 4343/7378 [14:53:57<10:15:19, 12.16s/it] + 59%|█████▉ | 4344/7378 [14:54:09<10:18:40, 12.23s/it] + +{'loss': 0.4582, 'learning_rate': 7.634395786528695e-06, 'epoch': 0.59} + + 59%|█████▉ | 4344/7378 [14:54:09<10:18:40, 12.23s/it] + 59%|█████▉ | 4345/7378 [14:54:21<10:16:53, 12.20s/it] + +{'loss': 0.4141, 'learning_rate': 7.630130469442302e-06, 'epoch': 0.59} + + 59%|█████▉ | 4345/7378 [14:54:21<10:16:53, 12.20s/it] + 59%|█████▉ | 4346/7378 [14:54:34<10:22:08, 12.31s/it] + +{'loss': 0.4309, 'learning_rate': 7.625865609111121e-06, 'epoch': 0.59} + + 59%|█████▉ | 4346/7378 [14:54:34<10:22:08, 12.31s/it] + 59%|█████▉ | 4347/7378 [14:54:46<10:16:52, 12.21s/it] + +{'loss': 0.4712, 'learning_rate': 7.621601206357139e-06, 'epoch': 0.59} + + 59%|█████▉ | 4347/7378 [14:54:46<10:16:52, 12.21s/it] + 59%|█████▉ | 4348/7378 [14:54:58<10:19:45, 12.27s/it] + +{'loss': 0.454, 'learning_rate': 7.61733726200225e-06, 'epoch': 0.59} + + 59%|█████▉ | 4348/7378 [14:54:58<10:19:45, 12.27s/it] + 59%|█████▉ | 4349/7378 [14:55:10<10:19:23, 12.27s/it] + +{'loss': 0.4499, 'learning_rate': 7.613073776868266e-06, 'epoch': 0.59} + + 59%|█████▉ | 4349/7378 [14:55:10<10:19:23, 12.27s/it] + 59%|█████▉ | 4350/7378 [14:55:24<10:39:49, 12.68s/it] + +{'loss': 0.4923, 'learning_rate': 7.608810751776902e-06, 'epoch': 0.59} + + 59%|█████▉ | 4350/7378 [14:55:24<10:39:49, 12.68s/it] + 59%|█████▉ | 4351/7378 [14:55:36<10:35:16, 12.59s/it] + +{'loss': 0.4894, 'learning_rate': 7.604548187549794e-06, 'epoch': 0.59} + + 59%|█████▉ | 4351/7378 [14:55:36<10:35:16, 12.59s/it] + 59%|█████▉ | 4352/7378 [14:55:49<10:34:35, 12.58s/it] + +{'loss': 0.5134, 'learning_rate': 7.6002860850084815e-06, 'epoch': 0.59} + + 59%|█████▉ | 4352/7378 [14:55:49<10:34:35, 12.58s/it] + 59%|█████▉ | 4353/7378 [14:56:02<10:36:39, 12.63s/it] + +{'loss': 0.4711, 'learning_rate': 7.596024444974417e-06, 'epoch': 0.59} + + 59%|█████▉ | 4353/7378 [14:56:02<10:36:39, 12.63s/it] + 59%|█████▉ | 4354/7378 [14:56:14<10:31:06, 12.52s/it] + +{'loss': 0.4979, 'learning_rate': 7.591763268268968e-06, 'epoch': 0.59} + + 59%|█████▉ | 4354/7378 [14:56:14<10:31:06, 12.52s/it] + 59%|█████▉ | 4355/7378 [14:56:27<10:33:25, 12.57s/it] + +{'loss': 0.4764, 'learning_rate': 7.587502555713405e-06, 'epoch': 0.59} + + 59%|█████▉ | 4355/7378 [14:56:27<10:33:25, 12.57s/it] + 59%|█████▉ | 4356/7378 [14:56:39<10:30:00, 12.51s/it] + +{'loss': 0.4567, 'learning_rate': 7.5832423081289195e-06, 'epoch': 0.59} + + 59%|█████▉ | 4356/7378 [14:56:39<10:30:00, 12.51s/it] + 59%|█████▉ | 4357/7378 [14:56:51<10:24:43, 12.41s/it] + +{'loss': 0.464, 'learning_rate': 7.5789825263366025e-06, 'epoch': 0.59} + + 59%|█████▉ | 4357/7378 [14:56:51<10:24:43, 12.41s/it] + 59%|█████▉ | 4358/7378 [14:57:03<10:21:35, 12.35s/it] + +{'loss': 0.4814, 'learning_rate': 7.574723211157464e-06, 'epoch': 0.59} + + 59%|█████▉ | 4358/7378 [14:57:03<10:21:35, 12.35s/it] + 59%|█████▉ | 4359/7378 [14:57:16<10:21:25, 12.35s/it] + +{'loss': 0.4341, 'learning_rate': 7.5704643634124155e-06, 'epoch': 0.59} + + 59%|█████▉ | 4359/7378 [14:57:16<10:21:25, 12.35s/it] + 59%|█████▉ | 4360/7378 [14:57:28<10:22:03, 12.37s/it] + +{'loss': 0.4408, 'learning_rate': 7.566205983922289e-06, 'epoch': 0.59} + + 59%|█████▉ | 4360/7378 [14:57:28<10:22:03, 12.37s/it] + 59%|█████▉ | 4361/7378 [14:57:40<10:21:00, 12.35s/it] + +{'loss': 0.4464, 'learning_rate': 7.561948073507818e-06, 'epoch': 0.59} + + 59%|█████▉ | 4361/7378 [14:57:40<10:21:00, 12.35s/it] + 59%|█████▉ | 4362/7378 [14:57:52<10:15:52, 12.25s/it] + +{'loss': 0.4648, 'learning_rate': 7.557690632989644e-06, 'epoch': 0.59} + + 59%|█████▉ | 4362/7378 [14:57:52<10:15:52, 12.25s/it] + 59%|█████▉ | 4363/7378 [14:58:05<10:14:28, 12.23s/it] + +{'loss': 0.4984, 'learning_rate': 7.553433663188328e-06, 'epoch': 0.59} + + 59%|█████▉ | 4363/7378 [14:58:05<10:14:28, 12.23s/it] + 59%|█████▉ | 4364/7378 [14:58:17<10:20:44, 12.36s/it] + +{'loss': 0.424, 'learning_rate': 7.549177164924329e-06, 'epoch': 0.59} + + 59%|█████▉ | 4364/7378 [14:58:17<10:20:44, 12.36s/it] + 59%|█████▉ | 4365/7378 [14:58:30<10:21:03, 12.37s/it] + +{'loss': 0.4534, 'learning_rate': 7.544921139018027e-06, 'epoch': 0.59} + + 59%|█████▉ | 4365/7378 [14:58:30<10:21:03, 12.37s/it] + 59%|█████▉ | 4366/7378 [14:58:45<11:11:52, 13.38s/it] + +{'loss': 0.4369, 'learning_rate': 7.5406655862896945e-06, 'epoch': 0.59} + + 59%|█████▉ | 4366/7378 [14:58:45<11:11:52, 13.38s/it] + 59%|█████▉ | 4367/7378 [14:58:57<10:51:05, 12.97s/it] + +{'loss': 0.4395, 'learning_rate': 7.536410507559533e-06, 'epoch': 0.59} + + 59%|█████▉ | 4367/7378 [14:58:57<10:51:05, 12.97s/it] + 59%|█████▉ | 4368/7378 [14:59:10<10:46:36, 12.89s/it] + +{'loss': 0.4386, 'learning_rate': 7.5321559036476375e-06, 'epoch': 0.59} + + 59%|█████▉ | 4368/7378 [14:59:10<10:46:36, 12.89s/it] + 59%|█████▉ | 4369/7378 [14:59:22<10:38:57, 12.74s/it] + +{'loss': 0.436, 'learning_rate': 7.527901775374014e-06, 'epoch': 0.59} + + 59%|█████▉ | 4369/7378 [14:59:22<10:38:57, 12.74s/it] + 59%|█████▉ | 4370/7378 [14:59:35<10:36:24, 12.69s/it] + +{'loss': 0.402, 'learning_rate': 7.523648123558582e-06, 'epoch': 0.59} + + 59%|█████▉ | 4370/7378 [14:59:35<10:36:24, 12.69s/it] + 59%|█████▉ | 4371/7378 [14:59:48<10:33:29, 12.64s/it] + +{'loss': 0.4835, 'learning_rate': 7.519394949021166e-06, 'epoch': 0.59} + + 59%|█████▉ | 4371/7378 [14:59:48<10:33:29, 12.64s/it] + 59%|█████▉ | 4372/7378 [15:00:00<10:32:31, 12.63s/it] + +{'loss': 0.4577, 'learning_rate': 7.515142252581499e-06, 'epoch': 0.59} + + 59%|█████▉ | 4372/7378 [15:00:00<10:32:31, 12.63s/it] + 59%|█████▉ | 4373/7378 [15:00:13<10:32:21, 12.63s/it] + +{'loss': 0.515, 'learning_rate': 7.5108900350592185e-06, 'epoch': 0.59} + + 59%|█████▉ | 4373/7378 [15:00:13<10:32:21, 12.63s/it] + 59%|█████▉ | 4374/7378 [15:00:25<10:28:29, 12.55s/it] + +{'loss': 0.4652, 'learning_rate': 7.506638297273877e-06, 'epoch': 0.59} + + 59%|█████▉ | 4374/7378 [15:00:25<10:28:29, 12.55s/it] + 59%|█████▉ | 4375/7378 [15:00:38<10:26:03, 12.51s/it] + +{'loss': 0.4028, 'learning_rate': 7.502387040044927e-06, 'epoch': 0.59} + + 59%|█████▉ | 4375/7378 [15:00:38<10:26:03, 12.51s/it] + 59%|█████▉ | 4376/7378 [15:00:50<10:20:44, 12.41s/it] + +{'loss': 0.3962, 'learning_rate': 7.498136264191736e-06, 'epoch': 0.59} + + 59%|█████▉ | 4376/7378 [15:00:50<10:20:44, 12.41s/it] + 59%|█████▉ | 4377/7378 [15:01:02<10:17:14, 12.34s/it] + +{'loss': 0.3823, 'learning_rate': 7.49388597053357e-06, 'epoch': 0.59} + + 59%|█████▉ | 4377/7378 [15:01:02<10:17:14, 12.34s/it] + 59%|█████▉ | 4378/7378 [15:01:14<10:19:25, 12.39s/it] + +{'loss': 0.4759, 'learning_rate': 7.489636159889607e-06, 'epoch': 0.59} + + 59%|█████▉ | 4378/7378 [15:01:14<10:19:25, 12.39s/it] + 59%|█████▉ | 4379/7378 [15:01:27<10:16:49, 12.34s/it] + +{'loss': 0.4674, 'learning_rate': 7.4853868330789345e-06, 'epoch': 0.59} + + 59%|█████▉ | 4379/7378 [15:01:27<10:16:49, 12.34s/it] + 59%|█████▉ | 4380/7378 [15:01:39<10:16:55, 12.35s/it] + +{'loss': 0.4748, 'learning_rate': 7.4811379909205395e-06, 'epoch': 0.59} + + 59%|█████▉ | 4380/7378 [15:01:39<10:16:55, 12.35s/it] + 59%|█████▉ | 4381/7378 [15:01:51<10:13:38, 12.29s/it] + +{'loss': 0.4706, 'learning_rate': 7.476889634233324e-06, 'epoch': 0.59} + + 59%|█████▉ | 4381/7378 [15:01:51<10:13:38, 12.29s/it] + 59%|█████▉ | 4382/7378 [15:02:03<10:12:45, 12.27s/it] + +{'loss': 0.4043, 'learning_rate': 7.472641763836088e-06, 'epoch': 0.59} + + 59%|█████▉ | 4382/7378 [15:02:03<10:12:45, 12.27s/it] + 59%|█████▉ | 4383/7378 [15:02:16<10:12:22, 12.27s/it] + +{'loss': 0.4166, 'learning_rate': 7.4683943805475465e-06, 'epoch': 0.59} + + 59%|█████▉ | 4383/7378 [15:02:16<10:12:22, 12.27s/it] + 59%|█████▉ | 4384/7378 [15:02:28<10:11:52, 12.26s/it] + +{'loss': 0.5003, 'learning_rate': 7.464147485186311e-06, 'epoch': 0.59} + + 59%|█████▉ | 4384/7378 [15:02:28<10:11:52, 12.26s/it] + 59%|█████▉ | 4385/7378 [15:02:40<10:12:16, 12.27s/it] + +{'loss': 0.4491, 'learning_rate': 7.459901078570909e-06, 'epoch': 0.59} + + 59%|█████▉ | 4385/7378 [15:02:40<10:12:16, 12.27s/it] + 59%|█████▉ | 4386/7378 [15:02:53<10:13:19, 12.30s/it] + +{'loss': 0.4607, 'learning_rate': 7.455655161519767e-06, 'epoch': 0.59} + + 59%|█████▉ | 4386/7378 [15:02:53<10:13:19, 12.30s/it] + 59%|█████▉ | 4387/7378 [15:03:05<10:14:51, 12.33s/it] + +{'loss': 0.4279, 'learning_rate': 7.451409734851216e-06, 'epoch': 0.59} + + 59%|█████▉ | 4387/7378 [15:03:05<10:14:51, 12.33s/it] + 59%|█████▉ | 4388/7378 [15:03:17<10:12:42, 12.30s/it] + +{'loss': 0.4528, 'learning_rate': 7.4471647993835e-06, 'epoch': 0.59} + + 59%|█████▉ | 4388/7378 [15:03:17<10:12:42, 12.30s/it] + 59%|█████▉ | 4389/7378 [15:03:30<10:18:33, 12.42s/it] + +{'loss': 0.4995, 'learning_rate': 7.442920355934758e-06, 'epoch': 0.59} + + 59%|█████▉ | 4389/7378 [15:03:30<10:18:33, 12.42s/it] + 60%|█████▉ | 4390/7378 [15:03:42<10:16:33, 12.38s/it] + +{'loss': 0.5482, 'learning_rate': 7.4386764053230434e-06, 'epoch': 0.6} + + 60%|█████▉ | 4390/7378 [15:03:42<10:16:33, 12.38s/it] + 60%|█████▉ | 4391/7378 [15:03:55<10:18:52, 12.43s/it] + +{'loss': 0.4611, 'learning_rate': 7.434432948366315e-06, 'epoch': 0.6} + + 60%|█████▉ | 4391/7378 [15:03:55<10:18:52, 12.43s/it] + 60%|█████▉ | 4392/7378 [15:04:07<10:20:25, 12.47s/it] + +{'loss': 0.4485, 'learning_rate': 7.430189985882427e-06, 'epoch': 0.6} + + 60%|█████▉ | 4392/7378 [15:04:07<10:20:25, 12.47s/it] + 60%|█████▉ | 4393/7378 [15:04:19<10:14:54, 12.36s/it] + +{'loss': 0.4909, 'learning_rate': 7.425947518689147e-06, 'epoch': 0.6} + + 60%|█████▉ | 4393/7378 [15:04:19<10:14:54, 12.36s/it] + 60%|█████▉ | 4394/7378 [15:04:32<10:15:22, 12.37s/it] + +{'loss': 0.4007, 'learning_rate': 7.421705547604144e-06, 'epoch': 0.6} + + 60%|█████▉ | 4394/7378 [15:04:32<10:15:22, 12.37s/it] + 60%|█████▉ | 4395/7378 [15:04:44<10:12:04, 12.31s/it] + +{'loss': 0.4729, 'learning_rate': 7.417464073444989e-06, 'epoch': 0.6} + + 60%|█████▉ | 4395/7378 [15:04:44<10:12:04, 12.31s/it] + 60%|█████▉ | 4396/7378 [15:04:56<10:11:33, 12.31s/it] + +{'loss': 0.4518, 'learning_rate': 7.413223097029163e-06, 'epoch': 0.6} + + 60%|█████▉ | 4396/7378 [15:04:56<10:11:33, 12.31s/it] + 60%|█████▉ | 4397/7378 [15:05:09<10:10:57, 12.30s/it] + +{'loss': 0.5047, 'learning_rate': 7.4089826191740435e-06, 'epoch': 0.6} + + 60%|█████▉ | 4397/7378 [15:05:09<10:10:57, 12.30s/it] + 60%|█████▉ | 4398/7378 [15:05:21<10:10:30, 12.29s/it] + +{'loss': 0.3857, 'learning_rate': 7.40474264069692e-06, 'epoch': 0.6} + + 60%|█████▉ | 4398/7378 [15:05:21<10:10:30, 12.29s/it] + 60%|█████▉ | 4399/7378 [15:05:34<10:16:56, 12.43s/it] + +{'loss': 0.4457, 'learning_rate': 7.400503162414978e-06, 'epoch': 0.6} + + 60%|█████▉ | 4399/7378 [15:05:34<10:16:56, 12.43s/it] + 60%|█████▉ | 4400/7378 [15:05:46<10:16:59, 12.43s/it] + +{'loss': 0.5482, 'learning_rate': 7.396264185145317e-06, 'epoch': 0.6} + + 60%|█████▉ | 4400/7378 [15:05:46<10:16:59, 12.43s/it] + 60%|█████▉ | 4401/7378 [15:05:58<10:16:19, 12.42s/it] + +{'loss': 0.3975, 'learning_rate': 7.392025709704924e-06, 'epoch': 0.6} + + 60%|█████▉ | 4401/7378 [15:05:58<10:16:19, 12.42s/it] + 60%|█████▉ | 4402/7378 [15:06:11<10:13:35, 12.37s/it] + +{'loss': 0.4865, 'learning_rate': 7.38778773691071e-06, 'epoch': 0.6} + + 60%|█████▉ | 4402/7378 [15:06:11<10:13:35, 12.37s/it] + 60%|█████▉ | 4403/7378 [15:06:23<10:13:06, 12.37s/it] + +{'loss': 0.4433, 'learning_rate': 7.383550267579469e-06, 'epoch': 0.6} + + 60%|█████▉ | 4403/7378 [15:06:23<10:13:06, 12.37s/it] + 60%|█████▉ | 4404/7378 [15:06:35<10:08:35, 12.28s/it] + +{'loss': 0.4483, 'learning_rate': 7.379313302527908e-06, 'epoch': 0.6} + + 60%|█████▉ | 4404/7378 [15:06:35<10:08:35, 12.28s/it] + 60%|█████▉ | 4405/7378 [15:06:47<10:07:19, 12.26s/it] + +{'loss': 0.4089, 'learning_rate': 7.375076842572641e-06, 'epoch': 0.6} + + 60%|█████▉ | 4405/7378 [15:06:47<10:07:19, 12.26s/it] + 60%|█████▉ | 4406/7378 [15:07:00<10:11:59, 12.35s/it] + +{'loss': 0.4589, 'learning_rate': 7.370840888530173e-06, 'epoch': 0.6} + + 60%|█████▉ | 4406/7378 [15:07:00<10:11:59, 12.35s/it] + 60%|█████▉ | 4407/7378 [15:07:12<10:15:15, 12.43s/it] + +{'loss': 0.463, 'learning_rate': 7.366605441216922e-06, 'epoch': 0.6} + + 60%|█████▉ | 4407/7378 [15:07:12<10:15:15, 12.43s/it] + 60%|█████▉ | 4408/7378 [15:07:25<10:24:04, 12.61s/it] + +{'loss': 0.5158, 'learning_rate': 7.362370501449201e-06, 'epoch': 0.6} + + 60%|█████▉ | 4408/7378 [15:07:25<10:24:04, 12.61s/it] + 60%|█████▉ | 4409/7378 [15:07:38<10:15:46, 12.44s/it] + +{'loss': 0.4903, 'learning_rate': 7.358136070043231e-06, 'epoch': 0.6} + + 60%|█████▉ | 4409/7378 [15:07:38<10:15:46, 12.44s/it] + 60%|█████▉ | 4410/7378 [15:07:50<10:12:51, 12.39s/it] + +{'loss': 0.3842, 'learning_rate': 7.353902147815128e-06, 'epoch': 0.6} + + 60%|█████▉ | 4410/7378 [15:07:50<10:12:51, 12.39s/it] + 60%|█████▉ | 4411/7378 [15:08:02<10:10:23, 12.34s/it] + +{'loss': 0.4411, 'learning_rate': 7.349668735580921e-06, 'epoch': 0.6} + + 60%|█████▉ | 4411/7378 [15:08:02<10:10:23, 12.34s/it] + 60%|█████▉ | 4412/7378 [15:08:14<10:10:57, 12.36s/it] + +{'loss': 0.506, 'learning_rate': 7.345435834156529e-06, 'epoch': 0.6} + + 60%|█████▉ | 4412/7378 [15:08:14<10:10:57, 12.36s/it] + 60%|█████▉ | 4413/7378 [15:08:27<10:12:53, 12.40s/it] + +{'loss': 0.4168, 'learning_rate': 7.3412034443577786e-06, 'epoch': 0.6} + + 60%|█████▉ | 4413/7378 [15:08:27<10:12:53, 12.40s/it] + 60%|█████▉ | 4414/7378 [15:08:43<11:06:01, 13.48s/it] + +{'loss': 0.4856, 'learning_rate': 7.336971567000396e-06, 'epoch': 0.6} + + 60%|█████▉ | 4414/7378 [15:08:43<11:06:01, 13.48s/it] + 60%|█████▉ | 4415/7378 [15:08:59<11:39:10, 14.16s/it] + +{'loss': 0.4195, 'learning_rate': 7.332740202900008e-06, 'epoch': 0.6} + + 60%|█████▉ | 4415/7378 [15:08:59<11:39:10, 14.16s/it] + 60%|█████▉ | 4416/7378 [15:09:12<11:20:47, 13.79s/it] + +{'loss': 0.5094, 'learning_rate': 7.328509352872149e-06, 'epoch': 0.6} + + 60%|█████▉ | 4416/7378 [15:09:12<11:20:47, 13.79s/it] + 60%|█████▉ | 4417/7378 [15:09:24<10:53:19, 13.24s/it] + +{'loss': 0.4264, 'learning_rate': 7.324279017732241e-06, 'epoch': 0.6} + + 60%|█████▉ | 4417/7378 [15:09:24<10:53:19, 13.24s/it] + 60%|█████▉ | 4418/7378 [15:09:36<10:34:08, 12.85s/it] + +{'loss': 0.438, 'learning_rate': 7.320049198295622e-06, 'epoch': 0.6} + + 60%|█████▉ | 4418/7378 [15:09:36<10:34:08, 12.85s/it] + 60%|█████▉ | 4419/7378 [15:09:51<11:16:10, 13.71s/it] + +{'loss': 0.5133, 'learning_rate': 7.31581989537752e-06, 'epoch': 0.6} + + 60%|█████▉ | 4419/7378 [15:09:51<11:16:10, 13.71s/it] + 60%|█████▉ | 4420/7378 [15:10:03<10:50:10, 13.19s/it] + +{'loss': 0.4153, 'learning_rate': 7.311591109793068e-06, 'epoch': 0.6} + + 60%|█████▉ | 4420/7378 [15:10:03<10:50:10, 13.19s/it] + 60%|█████▉ | 4421/7378 [15:10:19<11:29:25, 13.99s/it] + +{'loss': 0.46, 'learning_rate': 7.3073628423573e-06, 'epoch': 0.6} + + 60%|█████▉ | 4421/7378 [15:10:19<11:29:25, 13.99s/it] + 60%|█████▉ | 4422/7378 [15:10:31<11:04:48, 13.49s/it] + +{'loss': 0.4154, 'learning_rate': 7.303135093885141e-06, 'epoch': 0.6} + + 60%|█████▉ | 4422/7378 [15:10:31<11:04:48, 13.49s/it] + 60%|█████▉ | 4423/7378 [15:10:43<10:44:00, 13.08s/it] + +{'loss': 0.4938, 'learning_rate': 7.298907865191432e-06, 'epoch': 0.6} + + 60%|█████▉ | 4423/7378 [15:10:43<10:44:00, 13.08s/it] + 60%|█████▉ | 4424/7378 [15:10:56<10:40:04, 13.00s/it] + +{'loss': 0.3724, 'learning_rate': 7.294681157090899e-06, 'epoch': 0.6} + + 60%|█████▉ | 4424/7378 [15:10:56<10:40:04, 13.00s/it] + 60%|█████▉ | 4425/7378 [15:11:08<10:26:40, 12.73s/it] + +{'loss': 0.4388, 'learning_rate': 7.290454970398177e-06, 'epoch': 0.6} + + 60%|█████▉ | 4425/7378 [15:11:08<10:26:40, 12.73s/it] + 60%|█████▉ | 4426/7378 [15:11:21<10:19:14, 12.59s/it] + +{'loss': 0.4801, 'learning_rate': 7.286229305927796e-06, 'epoch': 0.6} + + 60%|█████▉ | 4426/7378 [15:11:21<10:19:14, 12.59s/it] + 60%|██████ | 4427/7378 [15:11:33<10:20:38, 12.62s/it] + +{'loss': 0.4116, 'learning_rate': 7.282004164494187e-06, 'epoch': 0.6} + + 60%|██████ | 4427/7378 [15:11:33<10:20:38, 12.62s/it] + 60%|██████ | 4428/7378 [15:11:46<10:15:55, 12.53s/it] + +{'loss': 0.454, 'learning_rate': 7.277779546911682e-06, 'epoch': 0.6} + + 60%|██████ | 4428/7378 [15:11:46<10:15:55, 12.53s/it] + 60%|██████ | 4429/7378 [15:12:01<10:51:01, 13.25s/it] + +{'loss': 0.491, 'learning_rate': 7.273555453994504e-06, 'epoch': 0.6} + + 60%|██████ | 4429/7378 [15:12:01<10:51:01, 13.25s/it] + 60%|██████ | 4430/7378 [15:12:12<10:30:45, 12.84s/it] + +{'loss': 0.3872, 'learning_rate': 7.269331886556786e-06, 'epoch': 0.6} + + 60%|██████ | 4430/7378 [15:12:12<10:30:45, 12.84s/it] + 60%|██████ | 4431/7378 [15:12:25<10:22:06, 12.67s/it] + +{'loss': 0.4712, 'learning_rate': 7.2651088454125515e-06, 'epoch': 0.6} + + 60%|██████ | 4431/7378 [15:12:25<10:22:06, 12.67s/it] + 60%|██████ | 4432/7378 [15:12:37<10:18:54, 12.60s/it] + +{'loss': 0.4697, 'learning_rate': 7.260886331375729e-06, 'epoch': 0.6} + + 60%|██████ | 4432/7378 [15:12:37<10:18:54, 12.60s/it] + 60%|██████ | 4433/7378 [15:12:49<10:11:43, 12.46s/it] + +{'loss': 0.5183, 'learning_rate': 7.256664345260134e-06, 'epoch': 0.6} + + 60%|██████ | 4433/7378 [15:12:49<10:11:43, 12.46s/it] + 60%|██████ | 4434/7378 [15:13:02<10:08:04, 12.39s/it] + +{'loss': 0.5073, 'learning_rate': 7.252442887879496e-06, 'epoch': 0.6} + + 60%|██████ | 4434/7378 [15:13:02<10:08:04, 12.39s/it] + 60%|██████ | 4435/7378 [15:13:14<10:04:02, 12.31s/it] + +{'loss': 0.4845, 'learning_rate': 7.248221960047437e-06, 'epoch': 0.6} + + 60%|██████ | 4435/7378 [15:13:14<10:04:02, 12.31s/it] + 60%|██████ | 4436/7378 [15:13:26<10:03:37, 12.31s/it] + +{'loss': 0.4864, 'learning_rate': 7.2440015625774655e-06, 'epoch': 0.6} + + 60%|██████ | 4436/7378 [15:13:26<10:03:37, 12.31s/it] + 60%|██████ | 4437/7378 [15:13:38<9:59:47, 12.24s/it] + +{'loss': 0.4661, 'learning_rate': 7.239781696283003e-06, 'epoch': 0.6} + + 60%|██████ | 4437/7378 [15:13:38<9:59:47, 12.24s/it] + 60%|██████ | 4438/7378 [15:13:51<10:02:52, 12.30s/it] + +{'loss': 0.4487, 'learning_rate': 7.235562361977364e-06, 'epoch': 0.6} + + 60%|██████ | 4438/7378 [15:13:51<10:02:52, 12.30s/it] + 60%|██████ | 4439/7378 [15:14:03<10:02:36, 12.30s/it] + +{'loss': 0.4822, 'learning_rate': 7.231343560473753e-06, 'epoch': 0.6} + + 60%|██████ | 4439/7378 [15:14:03<10:02:36, 12.30s/it] + 60%|██████ | 4440/7378 [15:14:15<10:01:13, 12.28s/it] + +{'loss': 0.4498, 'learning_rate': 7.227125292585283e-06, 'epoch': 0.6} + + 60%|██████ | 4440/7378 [15:14:15<10:01:13, 12.28s/it] + 60%|██████ | 4441/7378 [15:14:27<10:02:55, 12.32s/it] + +{'loss': 0.4448, 'learning_rate': 7.222907559124955e-06, 'epoch': 0.6} + + 60%|██████ | 4441/7378 [15:14:27<10:02:55, 12.32s/it] + 60%|██████ | 4442/7378 [15:14:40<9:59:58, 12.26s/it] + +{'loss': 0.443, 'learning_rate': 7.218690360905675e-06, 'epoch': 0.6} + + 60%|██████ | 4442/7378 [15:14:40<9:59:58, 12.26s/it] + 60%|██████ | 4443/7378 [15:14:52<10:02:26, 12.32s/it] + +{'loss': 0.441, 'learning_rate': 7.21447369874024e-06, 'epoch': 0.6} + + 60%|██████ | 4443/7378 [15:14:52<10:02:26, 12.32s/it] + 60%|██████ | 4444/7378 [15:15:05<10:08:44, 12.45s/it] + +{'loss': 0.4172, 'learning_rate': 7.210257573441346e-06, 'epoch': 0.6} + + 60%|██████ | 4444/7378 [15:15:05<10:08:44, 12.45s/it] + 60%|██████ | 4445/7378 [15:15:17<10:06:38, 12.41s/it] + +{'loss': 0.4464, 'learning_rate': 7.206041985821583e-06, 'epoch': 0.6} + + 60%|██████ | 4445/7378 [15:15:17<10:06:38, 12.41s/it] + 60%|██████ | 4446/7378 [15:15:30<10:07:51, 12.44s/it] + +{'loss': 0.4481, 'learning_rate': 7.2018269366934435e-06, 'epoch': 0.6} + + 60%|██████ | 4446/7378 [15:15:30<10:07:51, 12.44s/it] + 60%|██████ | 4447/7378 [15:15:42<10:06:22, 12.41s/it] + +{'loss': 0.4109, 'learning_rate': 7.197612426869309e-06, 'epoch': 0.6} + + 60%|██████ | 4447/7378 [15:15:42<10:06:22, 12.41s/it] + 60%|██████ | 4448/7378 [15:15:54<10:04:12, 12.37s/it] + +{'loss': 0.4767, 'learning_rate': 7.19339845716146e-06, 'epoch': 0.6} + + 60%|██████ | 4448/7378 [15:15:54<10:04:12, 12.37s/it] + 60%|██████ | 4449/7378 [15:16:07<10:10:39, 12.51s/it] + +{'loss': 0.3972, 'learning_rate': 7.189185028382076e-06, 'epoch': 0.6} + + 60%|██████ | 4449/7378 [15:16:07<10:10:39, 12.51s/it] + 60%|██████ | 4450/7378 [15:16:19<10:07:31, 12.45s/it] + +{'loss': 0.4838, 'learning_rate': 7.184972141343225e-06, 'epoch': 0.6} + + 60%|██████ | 4450/7378 [15:16:19<10:07:31, 12.45s/it] + 60%|██████ | 4451/7378 [15:16:32<10:10:21, 12.51s/it] + +{'loss': 0.3561, 'learning_rate': 7.18075979685688e-06, 'epoch': 0.6} + + 60%|██████ | 4451/7378 [15:16:32<10:10:21, 12.51s/it] + 60%|██████ | 4452/7378 [15:16:44<10:06:38, 12.44s/it] + +{'loss': 0.4597, 'learning_rate': 7.1765479957349e-06, 'epoch': 0.6} + + 60%|██████ | 4452/7378 [15:16:44<10:06:38, 12.44s/it] + 60%|██████ | 4453/7378 [15:16:57<10:08:45, 12.49s/it] + +{'loss': 0.4209, 'learning_rate': 7.172336738789048e-06, 'epoch': 0.6} + + 60%|██████ | 4453/7378 [15:16:57<10:08:45, 12.49s/it] + 60%|██████ | 4454/7378 [15:17:10<10:10:45, 12.53s/it] + +{'loss': 0.4219, 'learning_rate': 7.168126026830975e-06, 'epoch': 0.6} + + 60%|██████ | 4454/7378 [15:17:10<10:10:45, 12.53s/it] + 60%|██████ | 4455/7378 [15:17:22<10:10:09, 12.52s/it] + +{'loss': 0.4718, 'learning_rate': 7.163915860672227e-06, 'epoch': 0.6} + + 60%|██████ | 4455/7378 [15:17:22<10:10:09, 12.52s/it] + 60%|██████ | 4456/7378 [15:17:34<10:04:11, 12.41s/it] + +{'loss': 0.4229, 'learning_rate': 7.159706241124253e-06, 'epoch': 0.6} + + 60%|██████ | 4456/7378 [15:17:34<10:04:11, 12.41s/it] + 60%|██████ | 4457/7378 [15:17:46<9:58:52, 12.30s/it] + +{'loss': 0.5294, 'learning_rate': 7.155497168998386e-06, 'epoch': 0.6} + + 60%|██████ | 4457/7378 [15:17:46<9:58:52, 12.30s/it] + 60%|██████ | 4458/7378 [15:17:59<9:59:04, 12.31s/it] + +{'loss': 0.4834, 'learning_rate': 7.151288645105866e-06, 'epoch': 0.6} + + 60%|██████ | 4458/7378 [15:17:59<9:59:04, 12.31s/it] + 60%|██████ | 4459/7378 [15:18:11<9:54:20, 12.22s/it] + +{'loss': 0.4751, 'learning_rate': 7.147080670257811e-06, 'epoch': 0.6} + + 60%|██████ | 4459/7378 [15:18:11<9:54:20, 12.22s/it] + 60%|██████ | 4460/7378 [15:18:23<9:54:46, 12.23s/it] + +{'loss': 0.4237, 'learning_rate': 7.14287324526525e-06, 'epoch': 0.6} + + 60%|██████ | 4460/7378 [15:18:23<9:54:46, 12.23s/it] + 60%|██████ | 4461/7378 [15:18:35<9:57:56, 12.30s/it] + +{'loss': 0.4414, 'learning_rate': 7.138666370939093e-06, 'epoch': 0.6} + + 60%|██████ | 4461/7378 [15:18:35<9:57:56, 12.30s/it] + 60%|██████ | 4462/7378 [15:18:48<10:04:29, 12.44s/it] + +{'loss': 0.4928, 'learning_rate': 7.134460048090153e-06, 'epoch': 0.6} + + 60%|██████ | 4462/7378 [15:18:48<10:04:29, 12.44s/it] + 60%|██████ | 4463/7378 [15:19:01<10:08:17, 12.52s/it] + +{'loss': 0.4534, 'learning_rate': 7.1302542775291315e-06, 'epoch': 0.6} + + 60%|██████ | 4463/7378 [15:19:01<10:08:17, 12.52s/it] + 61%|██████ | 4464/7378 [15:19:13<10:05:03, 12.46s/it] + +{'loss': 0.4582, 'learning_rate': 7.126049060066621e-06, 'epoch': 0.61} + + 61%|██████ | 4464/7378 [15:19:13<10:05:03, 12.46s/it] + 61%|██████ | 4465/7378 [15:19:26<10:14:37, 12.66s/it] + +{'loss': 0.4456, 'learning_rate': 7.121844396513117e-06, 'epoch': 0.61} + + 61%|██████ | 4465/7378 [15:19:26<10:14:37, 12.66s/it] + 61%|██████ | 4466/7378 [15:19:38<10:07:37, 12.52s/it] + +{'loss': 0.411, 'learning_rate': 7.117640287678997e-06, 'epoch': 0.61} + + 61%|██████ | 4466/7378 [15:19:38<10:07:37, 12.52s/it] + 61%|██████ | 4467/7378 [15:19:50<9:59:46, 12.36s/it] + +{'loss': 0.4616, 'learning_rate': 7.1134367343745436e-06, 'epoch': 0.61} + + 61%|██████ | 4467/7378 [15:19:50<9:59:46, 12.36s/it] + 61%|██████ | 4468/7378 [15:20:03<9:56:37, 12.30s/it] + +{'loss': 0.4142, 'learning_rate': 7.109233737409919e-06, 'epoch': 0.61} + + 61%|██████ | 4468/7378 [15:20:03<9:56:37, 12.30s/it] + 61%|██████ | 4469/7378 [15:20:15<9:59:06, 12.36s/it] + +{'loss': 0.4418, 'learning_rate': 7.1050312975951915e-06, 'epoch': 0.61} + + 61%|██████ | 4469/7378 [15:20:15<9:59:06, 12.36s/it] + 61%|██████ | 4470/7378 [15:20:27<9:54:21, 12.26s/it] + +{'loss': 0.4547, 'learning_rate': 7.1008294157403105e-06, 'epoch': 0.61} + + 61%|██████ | 4470/7378 [15:20:27<9:54:21, 12.26s/it] + 61%|██████ | 4471/7378 [15:20:39<9:55:22, 12.29s/it] + +{'loss': 0.4064, 'learning_rate': 7.096628092655126e-06, 'epoch': 0.61} + + 61%|██████ | 4471/7378 [15:20:39<9:55:22, 12.29s/it] + 61%|██████ | 4472/7378 [15:20:52<9:58:27, 12.36s/it] + +{'loss': 0.4891, 'learning_rate': 7.092427329149376e-06, 'epoch': 0.61} + + 61%|██████ | 4472/7378 [15:20:52<9:58:27, 12.36s/it] + 61%|██████ | 4473/7378 [15:21:04<9:56:08, 12.31s/it] + +{'loss': 0.4953, 'learning_rate': 7.088227126032689e-06, 'epoch': 0.61} + + 61%|██████ | 4473/7378 [15:21:04<9:56:08, 12.31s/it] + 61%|██████ | 4474/7378 [15:21:16<9:53:25, 12.26s/it] + +{'loss': 0.4443, 'learning_rate': 7.084027484114595e-06, 'epoch': 0.61} + + 61%|██████ | 4474/7378 [15:21:16<9:53:25, 12.26s/it] + 61%|██████ | 4475/7378 [15:21:28<9:50:57, 12.21s/it] + +{'loss': 0.5159, 'learning_rate': 7.0798284042045005e-06, 'epoch': 0.61} + + 61%|██████ | 4475/7378 [15:21:28<9:50:57, 12.21s/it] + 61%|██████ | 4476/7378 [15:21:40<9:48:51, 12.17s/it] + +{'loss': 0.4584, 'learning_rate': 7.075629887111721e-06, 'epoch': 0.61} + + 61%|██████ | 4476/7378 [15:21:40<9:48:51, 12.17s/it] + 61%|██████ | 4477/7378 [15:21:53<9:51:17, 12.23s/it] + +{'loss': 0.4089, 'learning_rate': 7.071431933645446e-06, 'epoch': 0.61} + + 61%|██████ | 4477/7378 [15:21:53<9:51:17, 12.23s/it] + 61%|██████ | 4478/7378 [15:22:05<9:55:19, 12.32s/it] + +{'loss': 0.4813, 'learning_rate': 7.067234544614773e-06, 'epoch': 0.61} + + 61%|██████ | 4478/7378 [15:22:05<9:55:19, 12.32s/it] + 61%|██████ | 4479/7378 [15:22:18<9:56:58, 12.36s/it] + +{'loss': 0.4225, 'learning_rate': 7.0630377208286816e-06, 'epoch': 0.61} + + 61%|██████ | 4479/7378 [15:22:18<9:56:58, 12.36s/it] + 61%|██████ | 4480/7378 [15:22:30<10:00:27, 12.43s/it] + +{'loss': 0.4458, 'learning_rate': 7.058841463096042e-06, 'epoch': 0.61} + + 61%|██████ | 4480/7378 [15:22:30<10:00:27, 12.43s/it] + 61%|██████ | 4481/7378 [15:22:43<9:56:18, 12.35s/it] + +{'loss': 0.45, 'learning_rate': 7.054645772225617e-06, 'epoch': 0.61} + + 61%|██████ | 4481/7378 [15:22:43<9:56:18, 12.35s/it] + 61%|██████ | 4482/7378 [15:22:56<10:04:33, 12.53s/it] + +{'loss': 0.432, 'learning_rate': 7.05045064902606e-06, 'epoch': 0.61} + + 61%|██████ | 4482/7378 [15:22:56<10:04:33, 12.53s/it] + 61%|██████ | 4483/7378 [15:23:08<10:02:33, 12.49s/it] + +{'loss': 0.4453, 'learning_rate': 7.046256094305917e-06, 'epoch': 0.61} + + 61%|██████ | 4483/7378 [15:23:08<10:02:33, 12.49s/it] + 61%|██████ | 4484/7378 [15:23:20<9:54:00, 12.32s/it] + +{'loss': 0.3812, 'learning_rate': 7.042062108873622e-06, 'epoch': 0.61} + + 61%|██████ | 4484/7378 [15:23:20<9:54:00, 12.32s/it] + 61%|██████ | 4485/7378 [15:23:32<9:46:16, 12.16s/it] + +{'loss': 0.4691, 'learning_rate': 7.037868693537497e-06, 'epoch': 0.61} + + 61%|██████ | 4485/7378 [15:23:32<9:46:16, 12.16s/it] + 61%|██████ | 4486/7378 [15:23:44<9:43:25, 12.10s/it] + +{'loss': 0.4595, 'learning_rate': 7.033675849105763e-06, 'epoch': 0.61} + + 61%|██████ | 4486/7378 [15:23:44<9:43:25, 12.10s/it] + 61%|██████ | 4487/7378 [15:23:56<9:51:52, 12.28s/it] + +{'loss': 0.3784, 'learning_rate': 7.029483576386519e-06, 'epoch': 0.61} + + 61%|██████ | 4487/7378 [15:23:56<9:51:52, 12.28s/it] + 61%|██████ | 4488/7378 [15:24:09<9:56:16, 12.38s/it] + +{'loss': 0.4546, 'learning_rate': 7.025291876187765e-06, 'epoch': 0.61} + + 61%|██████ | 4488/7378 [15:24:09<9:56:16, 12.38s/it] + 61%|██████ | 4489/7378 [15:24:21<9:55:26, 12.37s/it] + +{'loss': 0.439, 'learning_rate': 7.021100749317382e-06, 'epoch': 0.61} + + 61%|██████ | 4489/7378 [15:24:21<9:55:26, 12.37s/it] + 61%|██████ | 4490/7378 [15:24:34<9:54:27, 12.35s/it] + +{'loss': 0.4177, 'learning_rate': 7.016910196583145e-06, 'epoch': 0.61} + + 61%|██████ | 4490/7378 [15:24:34<9:54:27, 12.35s/it] + 61%|██████ | 4491/7378 [15:24:45<9:46:14, 12.18s/it] + +{'loss': 0.4231, 'learning_rate': 7.012720218792719e-06, 'epoch': 0.61} + + 61%|██████ | 4491/7378 [15:24:45<9:46:14, 12.18s/it] + 61%|██████ | 4492/7378 [15:24:58<9:45:55, 12.18s/it] + +{'loss': 0.4276, 'learning_rate': 7.008530816753652e-06, 'epoch': 0.61} + + 61%|██████ | 4492/7378 [15:24:58<9:45:55, 12.18s/it] + 61%|██████ | 4493/7378 [15:25:10<9:48:02, 12.23s/it] + +{'loss': 0.4167, 'learning_rate': 7.004341991273391e-06, 'epoch': 0.61} + + 61%|██████ | 4493/7378 [15:25:10<9:48:02, 12.23s/it] + 61%|██████ | 4494/7378 [15:25:22<9:43:25, 12.14s/it] + +{'loss': 0.4267, 'learning_rate': 7.000153743159263e-06, 'epoch': 0.61} + + 61%|██████ | 4494/7378 [15:25:22<9:43:25, 12.14s/it] + 61%|██████ | 4495/7378 [15:25:34<9:47:40, 12.23s/it] + +{'loss': 0.3985, 'learning_rate': 6.99596607321849e-06, 'epoch': 0.61} + + 61%|██████ | 4495/7378 [15:25:34<9:47:40, 12.23s/it] + 61%|██████ | 4496/7378 [15:25:46<9:43:52, 12.16s/it] + +{'loss': 0.4475, 'learning_rate': 6.991778982258176e-06, 'epoch': 0.61} + + 61%|██████ | 4496/7378 [15:25:46<9:43:52, 12.16s/it] + 61%|██████ | 4497/7378 [15:25:58<9:44:14, 12.17s/it] + +{'loss': 0.4217, 'learning_rate': 6.987592471085322e-06, 'epoch': 0.61} + + 61%|██████ | 4497/7378 [15:25:58<9:44:14, 12.17s/it] + 61%|██████ | 4498/7378 [15:26:11<9:47:18, 12.24s/it] + +{'loss': 0.454, 'learning_rate': 6.983406540506809e-06, 'epoch': 0.61} + + 61%|██████ | 4498/7378 [15:26:11<9:47:18, 12.24s/it] + 61%|██████ | 4499/7378 [15:26:23<9:46:11, 12.22s/it] + +{'loss': 0.3939, 'learning_rate': 6.979221191329408e-06, 'epoch': 0.61} + + 61%|██████ | 4499/7378 [15:26:23<9:46:11, 12.22s/it] + 61%|██████ | 4500/7378 [15:26:35<9:46:11, 12.22s/it] + +{'loss': 0.4896, 'learning_rate': 6.975036424359783e-06, 'epoch': 0.61} + + 61%|██████ | 4500/7378 [15:26:35<9:46:11, 12.22s/it] + 61%|██████ | 4501/7378 [15:26:47<9:43:40, 12.17s/it] + +{'loss': 0.4348, 'learning_rate': 6.970852240404479e-06, 'epoch': 0.61} + + 61%|██████ | 4501/7378 [15:26:47<9:43:40, 12.17s/it] + 61%|██████ | 4502/7378 [15:26:59<9:43:56, 12.18s/it] + +{'loss': 0.4182, 'learning_rate': 6.966668640269938e-06, 'epoch': 0.61} + + 61%|██████ | 4502/7378 [15:26:59<9:43:56, 12.18s/it] + 61%|██████ | 4503/7378 [15:27:12<9:44:28, 12.20s/it] + +{'loss': 0.5156, 'learning_rate': 6.962485624762475e-06, 'epoch': 0.61} + + 61%|██████ | 4503/7378 [15:27:12<9:44:28, 12.20s/it] + 61%|██████ | 4504/7378 [15:27:24<9:45:30, 12.22s/it] + +{'loss': 0.4051, 'learning_rate': 6.958303194688307e-06, 'epoch': 0.61} + + 61%|██████ | 4504/7378 [15:27:24<9:45:30, 12.22s/it] + 61%|██████ | 4505/7378 [15:27:36<9:47:09, 12.26s/it] + +{'loss': 0.4354, 'learning_rate': 6.954121350853529e-06, 'epoch': 0.61} + + 61%|██████ | 4505/7378 [15:27:36<9:47:09, 12.26s/it] + 61%|██████ | 4506/7378 [15:27:49<9:48:28, 12.29s/it] + +{'loss': 0.4902, 'learning_rate': 6.949940094064127e-06, 'epoch': 0.61} + + 61%|██████ | 4506/7378 [15:27:49<9:48:28, 12.29s/it] + 61%|██████ | 4507/7378 [15:28:01<9:53:27, 12.40s/it] + +{'loss': 0.4951, 'learning_rate': 6.9457594251259734e-06, 'epoch': 0.61} + + 61%|██████ | 4507/7378 [15:28:01<9:53:27, 12.40s/it] + 61%|██████ | 4508/7378 [15:28:13<9:47:10, 12.28s/it] + +{'loss': 0.4216, 'learning_rate': 6.941579344844822e-06, 'epoch': 0.61} + + 61%|██████ | 4508/7378 [15:28:13<9:47:10, 12.28s/it] + 61%|██████ | 4509/7378 [15:28:25<9:44:36, 12.23s/it] + +{'loss': 0.3983, 'learning_rate': 6.937399854026325e-06, 'epoch': 0.61} + + 61%|██████ | 4509/7378 [15:28:25<9:44:36, 12.23s/it] + 61%|██████ | 4510/7378 [15:28:38<9:43:08, 12.20s/it] + +{'loss': 0.4533, 'learning_rate': 6.933220953476007e-06, 'epoch': 0.61} + + 61%|██████ | 4510/7378 [15:28:38<9:43:08, 12.20s/it] + 61%|██████ | 4511/7378 [15:28:50<9:38:59, 12.12s/it] + +{'loss': 0.4973, 'learning_rate': 6.929042643999291e-06, 'epoch': 0.61} + + 61%|██████ | 4511/7378 [15:28:50<9:38:59, 12.12s/it] + 61%|██████ | 4512/7378 [15:29:02<9:41:38, 12.18s/it] + +{'loss': 0.4457, 'learning_rate': 6.924864926401475e-06, 'epoch': 0.61} + + 61%|██████ | 4512/7378 [15:29:02<9:41:38, 12.18s/it] + 61%|██████ | 4513/7378 [15:29:14<9:36:17, 12.07s/it] + +{'loss': 0.4533, 'learning_rate': 6.920687801487755e-06, 'epoch': 0.61} + + 61%|██████ | 4513/7378 [15:29:14<9:36:17, 12.07s/it] + 61%|██████ | 4514/7378 [15:29:26<9:37:38, 12.10s/it] + +{'loss': 0.519, 'learning_rate': 6.916511270063204e-06, 'epoch': 0.61} + + 61%|██████ | 4514/7378 [15:29:26<9:37:38, 12.10s/it] + 61%|██████ | 4515/7378 [15:29:38<9:39:21, 12.14s/it] + +{'loss': 0.4952, 'learning_rate': 6.9123353329327795e-06, 'epoch': 0.61} + + 61%|██████ | 4515/7378 [15:29:38<9:39:21, 12.14s/it] + 61%|██████ | 4516/7378 [15:29:50<9:38:00, 12.12s/it] + +{'loss': 0.4714, 'learning_rate': 6.908159990901333e-06, 'epoch': 0.61} + + 61%|██████ | 4516/7378 [15:29:50<9:38:00, 12.12s/it] + 61%|██████ | 4517/7378 [15:30:02<9:38:49, 12.14s/it] + +{'loss': 0.4919, 'learning_rate': 6.90398524477359e-06, 'epoch': 0.61} + + 61%|██████ | 4517/7378 [15:30:02<9:38:49, 12.14s/it] + 61%|██████ | 4518/7378 [15:30:14<9:38:11, 12.13s/it] + +{'loss': 0.4397, 'learning_rate': 6.8998110953541755e-06, 'epoch': 0.61} + + 61%|██████ | 4518/7378 [15:30:14<9:38:11, 12.13s/it] + 61%|██████ | 4519/7378 [15:30:26<9:34:58, 12.07s/it] + +{'loss': 0.3962, 'learning_rate': 6.895637543447584e-06, 'epoch': 0.61} + + 61%|██████ | 4519/7378 [15:30:26<9:34:58, 12.07s/it] + 61%|██████▏ | 4520/7378 [15:30:38<9:32:47, 12.02s/it] + +{'loss': 0.4453, 'learning_rate': 6.891464589858203e-06, 'epoch': 0.61} + + 61%|██████▏ | 4520/7378 [15:30:38<9:32:47, 12.02s/it] + 61%|██████▏ | 4521/7378 [15:30:51<9:35:52, 12.09s/it] + +{'loss': 0.4911, 'learning_rate': 6.887292235390312e-06, 'epoch': 0.61} + + 61%|██████▏ | 4521/7378 [15:30:51<9:35:52, 12.09s/it] + 61%|██████▏ | 4522/7378 [15:31:03<9:39:09, 12.17s/it] + +{'loss': 0.3803, 'learning_rate': 6.883120480848058e-06, 'epoch': 0.61} + + 61%|██████▏ | 4522/7378 [15:31:03<9:39:09, 12.17s/it] + 61%|██████▏ | 4523/7378 [15:31:15<9:38:32, 12.16s/it] + +{'loss': 0.4538, 'learning_rate': 6.878949327035487e-06, 'epoch': 0.61} + + 61%|██████▏ | 4523/7378 [15:31:15<9:38:32, 12.16s/it] + 61%|██████▏ | 4524/7378 [15:31:28<9:45:29, 12.31s/it] + +{'loss': 0.4799, 'learning_rate': 6.874778774756521e-06, 'epoch': 0.61} + + 61%|██████▏ | 4524/7378 [15:31:28<9:45:29, 12.31s/it] + 61%|██████▏ | 4525/7378 [15:31:40<9:45:58, 12.32s/it] + +{'loss': 0.4156, 'learning_rate': 6.870608824814966e-06, 'epoch': 0.61} + + 61%|██████▏ | 4525/7378 [15:31:40<9:45:58, 12.32s/it] + 61%|██████▏ | 4526/7378 [15:31:52<9:43:20, 12.27s/it] + +{'loss': 0.438, 'learning_rate': 6.866439478014519e-06, 'epoch': 0.61} + + 61%|██████▏ | 4526/7378 [15:31:52<9:43:20, 12.27s/it] + 61%|██████▏ | 4527/7378 [15:32:05<9:51:50, 12.46s/it] + +{'loss': 0.4273, 'learning_rate': 6.862270735158754e-06, 'epoch': 0.61} + + 61%|██████▏ | 4527/7378 [15:32:05<9:51:50, 12.46s/it] + 61%|██████▏ | 4528/7378 [15:32:17<9:43:24, 12.28s/it] + +{'loss': 0.375, 'learning_rate': 6.858102597051132e-06, 'epoch': 0.61} + + 61%|██████▏ | 4528/7378 [15:32:17<9:43:24, 12.28s/it] + 61%|██████▏ | 4529/7378 [15:32:29<9:41:51, 12.25s/it] + +{'loss': 0.415, 'learning_rate': 6.853935064494993e-06, 'epoch': 0.61} + + 61%|██████▏ | 4529/7378 [15:32:29<9:41:51, 12.25s/it] + 61%|██████▏ | 4530/7378 [15:32:41<9:40:34, 12.23s/it] + +{'loss': 0.5039, 'learning_rate': 6.849768138293569e-06, 'epoch': 0.61} + + 61%|██████▏ | 4530/7378 [15:32:41<9:40:34, 12.23s/it] + 61%|██████▏ | 4531/7378 [15:32:54<9:41:06, 12.25s/it] + +{'loss': 0.4871, 'learning_rate': 6.8456018192499654e-06, 'epoch': 0.61} + + 61%|██████▏ | 4531/7378 [15:32:54<9:41:06, 12.25s/it] + 61%|██████▏ | 4532/7378 [15:33:06<9:38:12, 12.19s/it] + +{'loss': 0.4577, 'learning_rate': 6.8414361081671776e-06, 'epoch': 0.61} + + 61%|██████▏ | 4532/7378 [15:33:06<9:38:12, 12.19s/it] + 61%|██████▏ | 4533/7378 [15:33:18<9:39:47, 12.23s/it] + +{'loss': 0.485, 'learning_rate': 6.837271005848081e-06, 'epoch': 0.61} + + 61%|██████▏ | 4533/7378 [15:33:18<9:39:47, 12.23s/it] + 61%|██████▏ | 4534/7378 [15:33:30<9:40:14, 12.24s/it] + +{'loss': 0.4472, 'learning_rate': 6.8331065130954285e-06, 'epoch': 0.61} + + 61%|██████▏ | 4534/7378 [15:33:30<9:40:14, 12.24s/it] + 61%|██████▏ | 4535/7378 [15:33:43<9:42:03, 12.28s/it] + +{'loss': 0.4445, 'learning_rate': 6.828942630711869e-06, 'epoch': 0.61} + + 61%|██████▏ | 4535/7378 [15:33:43<9:42:03, 12.28s/it] + 61%|██████▏ | 4536/7378 [15:33:55<9:41:38, 12.28s/it] + +{'loss': 0.3706, 'learning_rate': 6.824779359499918e-06, 'epoch': 0.61} + + 61%|██████▏ | 4536/7378 [15:33:55<9:41:38, 12.28s/it] + 61%|██████▏ | 4537/7378 [15:34:08<9:50:45, 12.48s/it] + +{'loss': 0.5141, 'learning_rate': 6.8206167002619885e-06, 'epoch': 0.61} + + 61%|██████▏ | 4537/7378 [15:34:08<9:50:45, 12.48s/it] + 62%|██████▏ | 4538/7378 [15:34:20<9:46:51, 12.40s/it] + +{'loss': 0.4664, 'learning_rate': 6.816454653800359e-06, 'epoch': 0.62} + + 62%|██████▏ | 4538/7378 [15:34:20<9:46:51, 12.40s/it] + 62%|██████▏ | 4539/7378 [15:34:32<9:46:30, 12.40s/it] + +{'loss': 0.4771, 'learning_rate': 6.8122932209172075e-06, 'epoch': 0.62} + + 62%|██████▏ | 4539/7378 [15:34:32<9:46:30, 12.40s/it] + 62%|██████▏ | 4540/7378 [15:34:45<9:44:18, 12.35s/it] + +{'loss': 0.4882, 'learning_rate': 6.80813240241458e-06, 'epoch': 0.62} + + 62%|██████▏ | 4540/7378 [15:34:45<9:44:18, 12.35s/it] + 62%|██████▏ | 4541/7378 [15:34:57<9:42:06, 12.31s/it] + +{'loss': 0.4604, 'learning_rate': 6.803972199094409e-06, 'epoch': 0.62} + + 62%|██████▏ | 4541/7378 [15:34:57<9:42:06, 12.31s/it] + 62%|██████▏ | 4542/7378 [15:35:09<9:38:43, 12.24s/it] + +{'loss': 0.406, 'learning_rate': 6.799812611758511e-06, 'epoch': 0.62} + + 62%|██████▏ | 4542/7378 [15:35:09<9:38:43, 12.24s/it] + 62%|██████▏ | 4543/7378 [15:35:21<9:37:36, 12.22s/it] + +{'loss': 0.469, 'learning_rate': 6.7956536412085775e-06, 'epoch': 0.62} + + 62%|██████▏ | 4543/7378 [15:35:21<9:37:36, 12.22s/it] + 62%|██████▏ | 4544/7378 [15:35:34<9:39:21, 12.27s/it] + +{'loss': 0.435, 'learning_rate': 6.791495288246188e-06, 'epoch': 0.62} + + 62%|██████▏ | 4544/7378 [15:35:34<9:39:21, 12.27s/it] + 62%|██████▏ | 4545/7378 [15:35:46<9:48:36, 12.47s/it] + +{'loss': 0.551, 'learning_rate': 6.787337553672798e-06, 'epoch': 0.62} + + 62%|██████▏ | 4545/7378 [15:35:46<9:48:36, 12.47s/it] + 62%|██████▏ | 4546/7378 [15:35:59<9:42:55, 12.35s/it] + +{'loss': 0.4616, 'learning_rate': 6.783180438289749e-06, 'epoch': 0.62} + + 62%|██████▏ | 4546/7378 [15:35:59<9:42:55, 12.35s/it] + 62%|██████▏ | 4547/7378 [15:36:11<9:49:48, 12.50s/it] + +{'loss': 0.482, 'learning_rate': 6.779023942898255e-06, 'epoch': 0.62} + + 62%|██████▏ | 4547/7378 [15:36:11<9:49:48, 12.50s/it] + 62%|██████▏ | 4548/7378 [15:36:23<9:43:29, 12.37s/it] + +{'loss': 0.4453, 'learning_rate': 6.774868068299421e-06, 'epoch': 0.62} + + 62%|██████▏ | 4548/7378 [15:36:23<9:43:29, 12.37s/it] + 62%|██████▏ | 4549/7378 [15:36:36<9:42:09, 12.35s/it] + +{'loss': 0.5121, 'learning_rate': 6.770712815294223e-06, 'epoch': 0.62} + + 62%|██████▏ | 4549/7378 [15:36:36<9:42:09, 12.35s/it] + 62%|██████▏ | 4550/7378 [15:36:48<9:41:19, 12.33s/it] + +{'loss': 0.4119, 'learning_rate': 6.766558184683518e-06, 'epoch': 0.62} + + 62%|██████▏ | 4550/7378 [15:36:48<9:41:19, 12.33s/it] + 62%|██████▏ | 4551/7378 [15:37:01<9:45:17, 12.42s/it] + +{'loss': 0.3645, 'learning_rate': 6.762404177268053e-06, 'epoch': 0.62} + + 62%|██████▏ | 4551/7378 [15:37:01<9:45:17, 12.42s/it] + 62%|██████▏ | 4552/7378 [15:37:13<9:47:04, 12.46s/it] + +{'loss': 0.4969, 'learning_rate': 6.7582507938484406e-06, 'epoch': 0.62} + + 62%|██████▏ | 4552/7378 [15:37:13<9:47:04, 12.46s/it] + 62%|██████▏ | 4553/7378 [15:37:25<9:41:38, 12.35s/it] + +{'loss': 0.3642, 'learning_rate': 6.754098035225187e-06, 'epoch': 0.62} + + 62%|██████▏ | 4553/7378 [15:37:25<9:41:38, 12.35s/it] + 62%|██████▏ | 4554/7378 [15:37:38<9:40:50, 12.34s/it] + +{'loss': 0.4529, 'learning_rate': 6.749945902198667e-06, 'epoch': 0.62} + + 62%|██████▏ | 4554/7378 [15:37:38<9:40:50, 12.34s/it] + 62%|██████▏ | 4555/7378 [15:37:50<9:38:36, 12.30s/it] + +{'loss': 0.3677, 'learning_rate': 6.745794395569142e-06, 'epoch': 0.62} + + 62%|██████▏ | 4555/7378 [15:37:50<9:38:36, 12.30s/it] + 62%|██████▏ | 4556/7378 [15:38:02<9:35:30, 12.24s/it] + +{'loss': 0.5, 'learning_rate': 6.741643516136746e-06, 'epoch': 0.62} + + 62%|██████▏ | 4556/7378 [15:38:02<9:35:30, 12.24s/it] + 62%|██████▏ | 4557/7378 [15:38:14<9:38:34, 12.31s/it] + +{'loss': 0.4281, 'learning_rate': 6.7374932647015e-06, 'epoch': 0.62} + + 62%|██████▏ | 4557/7378 [15:38:14<9:38:34, 12.31s/it] + 62%|██████▏ | 4558/7378 [15:38:27<9:36:37, 12.27s/it] + +{'loss': 0.3528, 'learning_rate': 6.733343642063299e-06, 'epoch': 0.62} + + 62%|██████▏ | 4558/7378 [15:38:27<9:36:37, 12.27s/it] + 62%|██████▏ | 4559/7378 [15:38:39<9:36:03, 12.26s/it] + +{'loss': 0.4103, 'learning_rate': 6.729194649021915e-06, 'epoch': 0.62} + + 62%|██████▏ | 4559/7378 [15:38:39<9:36:03, 12.26s/it] + 62%|██████▏ | 4560/7378 [15:38:51<9:37:14, 12.29s/it] + +{'loss': 0.3923, 'learning_rate': 6.725046286377004e-06, 'epoch': 0.62} + + 62%|██████▏ | 4560/7378 [15:38:51<9:37:14, 12.29s/it] + 62%|██████▏ | 4561/7378 [15:39:04<9:41:03, 12.38s/it] + +{'loss': 0.4464, 'learning_rate': 6.720898554928097e-06, 'epoch': 0.62} + + 62%|██████▏ | 4561/7378 [15:39:04<9:41:03, 12.38s/it] + 62%|██████▏ | 4562/7378 [15:39:16<9:39:53, 12.36s/it] + +{'loss': 0.433, 'learning_rate': 6.716751455474606e-06, 'epoch': 0.62} + + 62%|██████▏ | 4562/7378 [15:39:16<9:39:53, 12.36s/it] + 62%|██████▏ | 4563/7378 [15:39:29<9:40:57, 12.38s/it] + +{'loss': 0.4559, 'learning_rate': 6.712604988815815e-06, 'epoch': 0.62} + + 62%|██████▏ | 4563/7378 [15:39:29<9:40:57, 12.38s/it] + 62%|██████▏ | 4564/7378 [15:39:41<9:47:19, 12.52s/it] + +{'loss': 0.4028, 'learning_rate': 6.708459155750892e-06, 'epoch': 0.62} + + 62%|██████▏ | 4564/7378 [15:39:41<9:47:19, 12.52s/it] + 62%|██████▏ | 4565/7378 [15:39:54<9:46:15, 12.50s/it] + +{'loss': 0.4361, 'learning_rate': 6.704313957078886e-06, 'epoch': 0.62} + + 62%|██████▏ | 4565/7378 [15:39:54<9:46:15, 12.50s/it] + 62%|██████▏ | 4566/7378 [15:40:06<9:46:13, 12.51s/it] + +{'loss': 0.464, 'learning_rate': 6.700169393598714e-06, 'epoch': 0.62} + + 62%|██████▏ | 4566/7378 [15:40:06<9:46:13, 12.51s/it] + 62%|██████▏ | 4567/7378 [15:40:19<9:46:20, 12.52s/it] + +{'loss': 0.4412, 'learning_rate': 6.696025466109181e-06, 'epoch': 0.62} + + 62%|██████▏ | 4567/7378 [15:40:19<9:46:20, 12.52s/it] + 62%|██████▏ | 4568/7378 [15:40:31<9:37:19, 12.33s/it] + +{'loss': 0.4272, 'learning_rate': 6.691882175408959e-06, 'epoch': 0.62} + + 62%|██████▏ | 4568/7378 [15:40:31<9:37:19, 12.33s/it] + 62%|██████▏ | 4569/7378 [15:40:43<9:31:05, 12.20s/it] + +{'loss': 0.4425, 'learning_rate': 6.6877395222966025e-06, 'epoch': 0.62} + + 62%|██████▏ | 4569/7378 [15:40:43<9:31:05, 12.20s/it] + 62%|██████▏ | 4570/7378 [15:40:55<9:33:04, 12.25s/it] + +{'loss': 0.3992, 'learning_rate': 6.683597507570545e-06, 'epoch': 0.62} + + 62%|██████▏ | 4570/7378 [15:40:55<9:33:04, 12.25s/it] + 62%|██████▏ | 4571/7378 [15:41:08<9:44:07, 12.49s/it] + +{'loss': 0.479, 'learning_rate': 6.679456132029094e-06, 'epoch': 0.62} + + 62%|██████▏ | 4571/7378 [15:41:08<9:44:07, 12.49s/it] + 62%|██████▏ | 4572/7378 [15:41:21<9:45:28, 12.52s/it] + +{'loss': 0.4398, 'learning_rate': 6.675315396470437e-06, 'epoch': 0.62} + + 62%|██████▏ | 4572/7378 [15:41:21<9:45:28, 12.52s/it] + 62%|██████▏ | 4573/7378 [15:41:33<9:42:15, 12.45s/it] + +{'loss': 0.429, 'learning_rate': 6.6711753016926305e-06, 'epoch': 0.62} + + 62%|██████▏ | 4573/7378 [15:41:33<9:42:15, 12.45s/it] + 62%|██████▏ | 4574/7378 [15:41:45<9:41:08, 12.44s/it] + +{'loss': 0.4698, 'learning_rate': 6.667035848493619e-06, 'epoch': 0.62} + + 62%|██████▏ | 4574/7378 [15:41:45<9:41:08, 12.44s/it] + 62%|██████▏ | 4575/7378 [15:41:58<9:37:27, 12.36s/it] + +{'loss': 0.4975, 'learning_rate': 6.662897037671215e-06, 'epoch': 0.62} + + 62%|██████▏ | 4575/7378 [15:41:58<9:37:27, 12.36s/it] + 62%|██████▏ | 4576/7378 [15:42:10<9:37:37, 12.37s/it] + +{'loss': 0.4867, 'learning_rate': 6.658758870023105e-06, 'epoch': 0.62} + + 62%|██████▏ | 4576/7378 [15:42:10<9:37:37, 12.37s/it] + 62%|██████▏ | 4577/7378 [15:42:22<9:34:11, 12.30s/it] + +{'loss': 0.4055, 'learning_rate': 6.654621346346864e-06, 'epoch': 0.62} + + 62%|██████▏ | 4577/7378 [15:42:22<9:34:11, 12.30s/it] + 62%|██████▏ | 4578/7378 [15:42:35<9:36:30, 12.35s/it] + +{'loss': 0.5015, 'learning_rate': 6.650484467439928e-06, 'epoch': 0.62} + + 62%|██████▏ | 4578/7378 [15:42:35<9:36:30, 12.35s/it] + 62%|██████▏ | 4579/7378 [15:42:47<9:33:01, 12.28s/it] + +{'loss': 0.4312, 'learning_rate': 6.646348234099621e-06, 'epoch': 0.62} + + 62%|██████▏ | 4579/7378 [15:42:47<9:33:01, 12.28s/it] + 62%|██████▏ | 4580/7378 [15:42:59<9:36:42, 12.37s/it] + +{'loss': 0.4705, 'learning_rate': 6.642212647123132e-06, 'epoch': 0.62} + + 62%|██████▏ | 4580/7378 [15:42:59<9:36:42, 12.37s/it] + 62%|██████▏ | 4581/7378 [15:43:12<9:37:41, 12.39s/it] + +{'loss': 0.3821, 'learning_rate': 6.638077707307535e-06, 'epoch': 0.62} + + 62%|██████▏ | 4581/7378 [15:43:12<9:37:41, 12.39s/it] + 62%|██████▏ | 4582/7378 [15:43:24<9:34:50, 12.34s/it] + +{'loss': 0.5261, 'learning_rate': 6.633943415449771e-06, 'epoch': 0.62} + + 62%|██████▏ | 4582/7378 [15:43:24<9:34:50, 12.34s/it] + 62%|██████▏ | 4583/7378 [15:43:36<9:36:54, 12.38s/it] + +{'loss': 0.4019, 'learning_rate': 6.6298097723466625e-06, 'epoch': 0.62} + + 62%|██████▏ | 4583/7378 [15:43:36<9:36:54, 12.38s/it] + 62%|██████▏ | 4584/7378 [15:43:49<9:34:35, 12.34s/it] + +{'loss': 0.4459, 'learning_rate': 6.625676778794905e-06, 'epoch': 0.62} + + 62%|██████▏ | 4584/7378 [15:43:49<9:34:35, 12.34s/it] + 62%|██████▏ | 4585/7378 [15:44:01<9:33:46, 12.33s/it] + +{'loss': 0.5227, 'learning_rate': 6.621544435591065e-06, 'epoch': 0.62} + + 62%|██████▏ | 4585/7378 [15:44:01<9:33:46, 12.33s/it] + 62%|██████▏ | 4586/7378 [15:44:13<9:31:21, 12.28s/it] + +{'loss': 0.4669, 'learning_rate': 6.617412743531592e-06, 'epoch': 0.62} + + 62%|██████▏ | 4586/7378 [15:44:13<9:31:21, 12.28s/it] + 62%|██████▏ | 4587/7378 [15:44:26<9:36:02, 12.38s/it] + +{'loss': 0.495, 'learning_rate': 6.613281703412798e-06, 'epoch': 0.62} + + 62%|██████▏ | 4587/7378 [15:44:26<9:36:02, 12.38s/it] + 62%|██████▏ | 4588/7378 [15:44:38<9:34:29, 12.35s/it] + +{'loss': 0.4353, 'learning_rate': 6.609151316030883e-06, 'epoch': 0.62} + + 62%|██████▏ | 4588/7378 [15:44:38<9:34:29, 12.35s/it] + 62%|██████▏ | 4589/7378 [15:44:51<9:39:13, 12.46s/it] + +{'loss': 0.4162, 'learning_rate': 6.60502158218191e-06, 'epoch': 0.62} + + 62%|██████▏ | 4589/7378 [15:44:51<9:39:13, 12.46s/it] + 62%|██████▏ | 4590/7378 [15:45:03<9:32:17, 12.32s/it] + +{'loss': 0.4217, 'learning_rate': 6.600892502661822e-06, 'epoch': 0.62} + + 62%|██████▏ | 4590/7378 [15:45:03<9:32:17, 12.32s/it] + 62%|██████▏ | 4591/7378 [15:45:15<9:31:37, 12.31s/it] + +{'loss': 0.4739, 'learning_rate': 6.596764078266433e-06, 'epoch': 0.62} + + 62%|██████▏ | 4591/7378 [15:45:15<9:31:37, 12.31s/it] + 62%|██████▏ | 4592/7378 [15:45:27<9:25:57, 12.19s/it] + +{'loss': 0.4693, 'learning_rate': 6.592636309791437e-06, 'epoch': 0.62} + + 62%|██████▏ | 4592/7378 [15:45:27<9:25:57, 12.19s/it] + 62%|██████▏ | 4593/7378 [15:45:39<9:23:31, 12.14s/it] + +{'loss': 0.4562, 'learning_rate': 6.5885091980323925e-06, 'epoch': 0.62} + + 62%|██████▏ | 4593/7378 [15:45:39<9:23:31, 12.14s/it] + 62%|██████▏ | 4594/7378 [15:45:51<9:23:53, 12.15s/it] + +{'loss': 0.3607, 'learning_rate': 6.584382743784734e-06, 'epoch': 0.62} + + 62%|██████▏ | 4594/7378 [15:45:51<9:23:53, 12.15s/it] + 62%|██████▏ | 4595/7378 [15:46:03<9:21:41, 12.11s/it] + +{'loss': 0.4759, 'learning_rate': 6.580256947843775e-06, 'epoch': 0.62} + + 62%|██████▏ | 4595/7378 [15:46:03<9:21:41, 12.11s/it] + 62%|██████▏ | 4596/7378 [15:46:15<9:21:06, 12.10s/it] + +{'loss': 0.4985, 'learning_rate': 6.576131811004693e-06, 'epoch': 0.62} + + 62%|██████▏ | 4596/7378 [15:46:15<9:21:06, 12.10s/it] + 62%|██████▏ | 4597/7378 [15:46:28<9:24:48, 12.19s/it] + +{'loss': 0.4525, 'learning_rate': 6.5720073340625505e-06, 'epoch': 0.62} + + 62%|██████▏ | 4597/7378 [15:46:28<9:24:48, 12.19s/it] + 62%|██████▏ | 4598/7378 [15:46:40<9:25:50, 12.21s/it] + +{'loss': 0.4075, 'learning_rate': 6.567883517812268e-06, 'epoch': 0.62} + + 62%|██████▏ | 4598/7378 [15:46:40<9:25:50, 12.21s/it] + 62%|██████▏ | 4599/7378 [15:46:53<9:32:58, 12.37s/it] + +{'loss': 0.4665, 'learning_rate': 6.5637603630486545e-06, 'epoch': 0.62} + + 62%|██████▏ | 4599/7378 [15:46:53<9:32:58, 12.37s/it] + 62%|██████▏ | 4600/7378 [15:47:05<9:37:31, 12.47s/it] + +{'loss': 0.504, 'learning_rate': 6.559637870566378e-06, 'epoch': 0.62} + + 62%|██████▏ | 4600/7378 [15:47:05<9:37:31, 12.47s/it] + 62%|██████▏ | 4601/7378 [15:47:18<9:33:45, 12.40s/it] + +{'loss': 0.4776, 'learning_rate': 6.555516041159984e-06, 'epoch': 0.62} + + 62%|██████▏ | 4601/7378 [15:47:18<9:33:45, 12.40s/it] + 62%|██████▏ | 4602/7378 [15:47:30<9:29:11, 12.30s/it] + +{'loss': 0.4557, 'learning_rate': 6.551394875623893e-06, 'epoch': 0.62} + + 62%|██████▏ | 4602/7378 [15:47:30<9:29:11, 12.30s/it] + 62%|██████▏ | 4603/7378 [15:47:42<9:25:40, 12.23s/it] + +{'loss': 0.4974, 'learning_rate': 6.547274374752395e-06, 'epoch': 0.62} + + 62%|██████▏ | 4603/7378 [15:47:42<9:25:40, 12.23s/it] + 62%|██████▏ | 4604/7378 [15:47:54<9:24:11, 12.20s/it] + +{'loss': 0.4017, 'learning_rate': 6.5431545393396516e-06, 'epoch': 0.62} + + 62%|██████▏ | 4604/7378 [15:47:54<9:24:11, 12.20s/it] + 62%|██████▏ | 4605/7378 [15:48:06<9:21:52, 12.16s/it] + +{'loss': 0.4388, 'learning_rate': 6.5390353701796936e-06, 'epoch': 0.62} + + 62%|██████▏ | 4605/7378 [15:48:06<9:21:52, 12.16s/it] + 62%|██████▏ | 4606/7378 [15:48:18<9:25:08, 12.23s/it] + +{'loss': 0.45, 'learning_rate': 6.534916868066431e-06, 'epoch': 0.62} + + 62%|██████▏ | 4606/7378 [15:48:18<9:25:08, 12.23s/it] + 62%|██████▏ | 4607/7378 [15:48:31<9:30:46, 12.36s/it] + +{'loss': 0.4541, 'learning_rate': 6.530799033793636e-06, 'epoch': 0.62} + + 62%|██████▏ | 4607/7378 [15:48:31<9:30:46, 12.36s/it] + 62%|██████▏ | 4608/7378 [15:48:43<9:28:17, 12.31s/it] + +{'loss': 0.4367, 'learning_rate': 6.526681868154958e-06, 'epoch': 0.62} + + 62%|██████▏ | 4608/7378 [15:48:43<9:28:17, 12.31s/it] + 62%|██████▏ | 4609/7378 [15:48:55<9:27:03, 12.29s/it] + +{'loss': 0.4078, 'learning_rate': 6.522565371943921e-06, 'epoch': 0.62} + + 62%|██████▏ | 4609/7378 [15:48:55<9:27:03, 12.29s/it] + 62%|██████▏ | 4610/7378 [15:49:08<9:27:11, 12.29s/it] + +{'loss': 0.4239, 'learning_rate': 6.518449545953911e-06, 'epoch': 0.62} + + 62%|██████▏ | 4610/7378 [15:49:08<9:27:11, 12.29s/it] + 62%|██████▏ | 4611/7378 [15:49:20<9:27:10, 12.30s/it] + +{'loss': 0.4172, 'learning_rate': 6.514334390978188e-06, 'epoch': 0.62} + + 62%|██████▏ | 4611/7378 [15:49:20<9:27:10, 12.30s/it] + 63%|██████▎ | 4612/7378 [15:49:32<9:25:44, 12.27s/it] + +{'loss': 0.4171, 'learning_rate': 6.510219907809885e-06, 'epoch': 0.63} + + 63%|██████▎ | 4612/7378 [15:49:32<9:25:44, 12.27s/it] + 63%|██████▎ | 4613/7378 [15:49:45<9:27:08, 12.31s/it] + +{'loss': 0.429, 'learning_rate': 6.506106097242003e-06, 'epoch': 0.63} + + 63%|██████▎ | 4613/7378 [15:49:45<9:27:08, 12.31s/it] + 63%|██████▎ | 4614/7378 [15:49:57<9:24:46, 12.26s/it] + +{'loss': 0.4547, 'learning_rate': 6.501992960067418e-06, 'epoch': 0.63} + + 63%|██████▎ | 4614/7378 [15:49:57<9:24:46, 12.26s/it] + 63%|██████▎ | 4615/7378 [15:50:09<9:23:08, 12.23s/it] + +{'loss': 0.4369, 'learning_rate': 6.497880497078868e-06, 'epoch': 0.63} + + 63%|██████▎ | 4615/7378 [15:50:09<9:23:08, 12.23s/it] + 63%|██████▎ | 4616/7378 [15:50:21<9:24:52, 12.27s/it] + +{'loss': 0.4533, 'learning_rate': 6.493768709068969e-06, 'epoch': 0.63} + + 63%|██████▎ | 4616/7378 [15:50:21<9:24:52, 12.27s/it] + 63%|██████▎ | 4617/7378 [15:50:34<9:27:13, 12.33s/it] + +{'loss': 0.4511, 'learning_rate': 6.489657596830201e-06, 'epoch': 0.63} + + 63%|██████▎ | 4617/7378 [15:50:34<9:27:13, 12.33s/it] + 63%|██████▎ | 4618/7378 [15:50:46<9:31:24, 12.42s/it] + +{'loss': 0.4371, 'learning_rate': 6.485547161154922e-06, 'epoch': 0.63} + + 63%|██████▎ | 4618/7378 [15:50:46<9:31:24, 12.42s/it] + 63%|██████▎ | 4619/7378 [15:50:59<9:32:26, 12.45s/it] + +{'loss': 0.4333, 'learning_rate': 6.481437402835349e-06, 'epoch': 0.63} + + 63%|██████▎ | 4619/7378 [15:50:59<9:32:26, 12.45s/it] + 63%|██████▎ | 4620/7378 [15:51:11<9:28:57, 12.38s/it] + +{'loss': 0.3893, 'learning_rate': 6.477328322663572e-06, 'epoch': 0.63} + + 63%|██████▎ | 4620/7378 [15:51:11<9:28:57, 12.38s/it] + 63%|██████▎ | 4621/7378 [15:51:24<9:35:21, 12.52s/it] + +{'loss': 0.522, 'learning_rate': 6.473219921431557e-06, 'epoch': 0.63} + + 63%|██████▎ | 4621/7378 [15:51:24<9:35:21, 12.52s/it] + 63%|██████▎ | 4622/7378 [15:51:36<9:28:41, 12.38s/it] + +{'loss': 0.4727, 'learning_rate': 6.469112199931131e-06, 'epoch': 0.63} + + 63%|██████▎ | 4622/7378 [15:51:36<9:28:41, 12.38s/it] + 63%|██████▎ | 4623/7378 [15:51:49<9:34:30, 12.51s/it] + +{'loss': 0.5042, 'learning_rate': 6.465005158953994e-06, 'epoch': 0.63} + + 63%|██████▎ | 4623/7378 [15:51:49<9:34:30, 12.51s/it] + 63%|██████▎ | 4624/7378 [15:52:01<9:29:30, 12.41s/it] + +{'loss': 0.4235, 'learning_rate': 6.460898799291711e-06, 'epoch': 0.63} + + 63%|██████▎ | 4624/7378 [15:52:01<9:29:30, 12.41s/it] + 63%|██████▎ | 4625/7378 [15:52:14<9:31:46, 12.46s/it] + +{'loss': 0.4931, 'learning_rate': 6.456793121735724e-06, 'epoch': 0.63} + + 63%|██████▎ | 4625/7378 [15:52:14<9:31:46, 12.46s/it] + 63%|██████▎ | 4626/7378 [15:52:26<9:28:37, 12.40s/it] + +{'loss': 0.3962, 'learning_rate': 6.452688127077333e-06, 'epoch': 0.63} + + 63%|██████▎ | 4626/7378 [15:52:26<9:28:37, 12.40s/it] + 63%|██████▎ | 4627/7378 [15:52:38<9:26:39, 12.36s/it] + +{'loss': 0.47, 'learning_rate': 6.448583816107713e-06, 'epoch': 0.63} + + 63%|██████▎ | 4627/7378 [15:52:38<9:26:39, 12.36s/it] + 63%|██████▎ | 4628/7378 [15:52:50<9:22:58, 12.28s/it] + +{'loss': 0.3917, 'learning_rate': 6.444480189617908e-06, 'epoch': 0.63} + + 63%|██████▎ | 4628/7378 [15:52:50<9:22:58, 12.28s/it] + 63%|██████▎ | 4629/7378 [15:53:02<9:22:56, 12.29s/it] + +{'loss': 0.4693, 'learning_rate': 6.440377248398821e-06, 'epoch': 0.63} + + 63%|██████▎ | 4629/7378 [15:53:02<9:22:56, 12.29s/it] + 63%|██████▎ | 4630/7378 [15:53:15<9:27:48, 12.40s/it] + +{'loss': 0.4627, 'learning_rate': 6.436274993241238e-06, 'epoch': 0.63} + + 63%|██████▎ | 4630/7378 [15:53:15<9:27:48, 12.40s/it] + 63%|██████▎ | 4631/7378 [15:53:27<9:27:01, 12.38s/it] + +{'loss': 0.4731, 'learning_rate': 6.432173424935797e-06, 'epoch': 0.63} + + 63%|██████▎ | 4631/7378 [15:53:28<9:27:01, 12.38s/it] + 63%|██████▎ | 4632/7378 [15:53:40<9:23:01, 12.30s/it] + +{'loss': 0.4424, 'learning_rate': 6.428072544273019e-06, 'epoch': 0.63} + + 63%|██████▎ | 4632/7378 [15:53:40<9:23:01, 12.30s/it] + 63%|██████▎ | 4633/7378 [15:53:53<9:37:07, 12.61s/it] + +{'loss': 0.4183, 'learning_rate': 6.423972352043275e-06, 'epoch': 0.63} + + 63%|██████▎ | 4633/7378 [15:53:53<9:37:07, 12.61s/it] + 63%|██████▎ | 4634/7378 [15:54:05<9:35:56, 12.59s/it] + +{'loss': 0.4616, 'learning_rate': 6.419872849036821e-06, 'epoch': 0.63} + + 63%|██████▎ | 4634/7378 [15:54:05<9:35:56, 12.59s/it] + 63%|██████▎ | 4635/7378 [15:54:18<9:30:38, 12.48s/it] + +{'loss': 0.4452, 'learning_rate': 6.4157740360437695e-06, 'epoch': 0.63} + + 63%|██████▎ | 4635/7378 [15:54:18<9:30:38, 12.48s/it] + 63%|██████▎ | 4636/7378 [15:54:30<9:25:05, 12.37s/it] + +{'loss': 0.4597, 'learning_rate': 6.411675913854098e-06, 'epoch': 0.63} + + 63%|██████▎ | 4636/7378 [15:54:30<9:25:05, 12.37s/it] + 63%|██████▎ | 4637/7378 [15:54:42<9:22:19, 12.31s/it] + +{'loss': 0.4883, 'learning_rate': 6.407578483257662e-06, 'epoch': 0.63} + + 63%|██████▎ | 4637/7378 [15:54:42<9:22:19, 12.31s/it] + 63%|██████▎ | 4638/7378 [15:54:54<9:18:44, 12.24s/it] + +{'loss': 0.4774, 'learning_rate': 6.403481745044171e-06, 'epoch': 0.63} + + 63%|██████▎ | 4638/7378 [15:54:54<9:18:44, 12.24s/it] + 63%|██████▎ | 4639/7378 [15:55:06<9:17:33, 12.21s/it] + +{'loss': 0.4574, 'learning_rate': 6.399385700003213e-06, 'epoch': 0.63} + + 63%|██████▎ | 4639/7378 [15:55:06<9:17:33, 12.21s/it] + 63%|██████▎ | 4640/7378 [15:55:18<9:17:38, 12.22s/it] + +{'loss': 0.4576, 'learning_rate': 6.395290348924232e-06, 'epoch': 0.63} + + 63%|██████▎ | 4640/7378 [15:55:18<9:17:38, 12.22s/it] + 63%|██████▎ | 4641/7378 [15:55:30<9:12:48, 12.12s/it] + +{'loss': 0.4269, 'learning_rate': 6.391195692596546e-06, 'epoch': 0.63} + + 63%|██████▎ | 4641/7378 [15:55:30<9:12:48, 12.12s/it] + 63%|██████▎ | 4642/7378 [15:55:43<9:14:17, 12.16s/it] + +{'loss': 0.4676, 'learning_rate': 6.387101731809332e-06, 'epoch': 0.63} + + 63%|██████▎ | 4642/7378 [15:55:43<9:14:17, 12.16s/it] + 63%|██████▎ | 4643/7378 [15:55:55<9:15:07, 12.18s/it] + +{'loss': 0.4509, 'learning_rate': 6.3830084673516415e-06, 'epoch': 0.63} + + 63%|██████▎ | 4643/7378 [15:55:55<9:15:07, 12.18s/it] + 63%|██████▎ | 4644/7378 [15:56:08<9:22:46, 12.35s/it] + +{'loss': 0.4291, 'learning_rate': 6.378915900012383e-06, 'epoch': 0.63} + + 63%|██████▎ | 4644/7378 [15:56:08<9:22:46, 12.35s/it] + 63%|██████▎ | 4645/7378 [15:56:20<9:24:43, 12.40s/it] + +{'loss': 0.4188, 'learning_rate': 6.374824030580336e-06, 'epoch': 0.63} + + 63%|██████▎ | 4645/7378 [15:56:20<9:24:43, 12.40s/it] + 63%|██████▎ | 4646/7378 [15:56:32<9:20:00, 12.30s/it] + +{'loss': 0.4843, 'learning_rate': 6.370732859844145e-06, 'epoch': 0.63} + + 63%|██████▎ | 4646/7378 [15:56:32<9:20:00, 12.30s/it] + 63%|██████▎ | 4647/7378 [15:56:44<9:12:03, 12.13s/it] + +{'loss': 0.3995, 'learning_rate': 6.366642388592317e-06, 'epoch': 0.63} + + 63%|██████▎ | 4647/7378 [15:56:44<9:12:03, 12.13s/it] + 63%|██████▎ | 4648/7378 [15:56:57<9:29:33, 12.52s/it] + +{'loss': 0.479, 'learning_rate': 6.36255261761323e-06, 'epoch': 0.63} + + 63%|██████▎ | 4648/7378 [15:56:57<9:29:33, 12.52s/it] + 63%|██████▎ | 4649/7378 [15:57:10<9:31:09, 12.56s/it] + +{'loss': 0.4813, 'learning_rate': 6.3584635476951195e-06, 'epoch': 0.63} + + 63%|██████▎ | 4649/7378 [15:57:10<9:31:09, 12.56s/it] + 63%|██████▎ | 4650/7378 [15:57:22<9:28:51, 12.51s/it] + +{'loss': 0.4513, 'learning_rate': 6.354375179626092e-06, 'epoch': 0.63} + + 63%|██████▎ | 4650/7378 [15:57:22<9:28:51, 12.51s/it] + 63%|██████▎ | 4651/7378 [15:57:35<9:28:12, 12.50s/it] + +{'loss': 0.4189, 'learning_rate': 6.350287514194112e-06, 'epoch': 0.63} + + 63%|██████▎ | 4651/7378 [15:57:35<9:28:12, 12.50s/it] + 63%|██████▎ | 4652/7378 [15:57:47<9:26:14, 12.46s/it] + +{'loss': 0.4357, 'learning_rate': 6.346200552187019e-06, 'epoch': 0.63} + + 63%|██████▎ | 4652/7378 [15:57:47<9:26:14, 12.46s/it] + 63%|██████▎ | 4653/7378 [15:58:00<9:29:46, 12.55s/it] + +{'loss': 0.4169, 'learning_rate': 6.342114294392509e-06, 'epoch': 0.63} + + 63%|██████▎ | 4653/7378 [15:58:00<9:29:46, 12.55s/it] + 63%|██████▎ | 4654/7378 [15:58:12<9:29:36, 12.55s/it] + +{'loss': 0.4281, 'learning_rate': 6.338028741598144e-06, 'epoch': 0.63} + + 63%|██████▎ | 4654/7378 [15:58:12<9:29:36, 12.55s/it] + 63%|██████▎ | 4655/7378 [15:58:25<9:27:41, 12.51s/it] + +{'loss': 0.5089, 'learning_rate': 6.333943894591349e-06, 'epoch': 0.63} + + 63%|██████▎ | 4655/7378 [15:58:25<9:27:41, 12.51s/it] + 63%|██████▎ | 4656/7378 [15:58:37<9:21:51, 12.38s/it] + +{'loss': 0.4424, 'learning_rate': 6.3298597541594155e-06, 'epoch': 0.63} + + 63%|██████▎ | 4656/7378 [15:58:37<9:21:51, 12.38s/it] + 63%|██████▎ | 4657/7378 [15:58:49<9:17:29, 12.29s/it] + +{'loss': 0.4541, 'learning_rate': 6.325776321089496e-06, 'epoch': 0.63} + + 63%|██████▎ | 4657/7378 [15:58:49<9:17:29, 12.29s/it] + 63%|██████▎ | 4658/7378 [15:59:02<9:26:53, 12.50s/it] + +{'loss': 0.4658, 'learning_rate': 6.321693596168611e-06, 'epoch': 0.63} + + 63%|██████▎ | 4658/7378 [15:59:02<9:26:53, 12.50s/it] + 63%|██████▎ | 4659/7378 [15:59:15<9:27:27, 12.52s/it] + +{'loss': 0.3938, 'learning_rate': 6.317611580183638e-06, 'epoch': 0.63} + + 63%|██████▎ | 4659/7378 [15:59:15<9:27:27, 12.52s/it] + 63%|██████▎ | 4660/7378 [15:59:27<9:20:58, 12.38s/it] + +{'loss': 0.4558, 'learning_rate': 6.313530273921325e-06, 'epoch': 0.63} + + 63%|██████▎ | 4660/7378 [15:59:27<9:20:58, 12.38s/it] + 63%|██████▎ | 4661/7378 [15:59:39<9:18:12, 12.33s/it] + +{'loss': 0.4487, 'learning_rate': 6.30944967816828e-06, 'epoch': 0.63} + + 63%|██████▎ | 4661/7378 [15:59:39<9:18:12, 12.33s/it] + 63%|██████▎ | 4662/7378 [15:59:51<9:18:09, 12.33s/it] + +{'loss': 0.4492, 'learning_rate': 6.30536979371097e-06, 'epoch': 0.63} + + 63%|██████▎ | 4662/7378 [15:59:51<9:18:09, 12.33s/it] + 63%|██████▎ | 4663/7378 [16:00:03<9:15:43, 12.28s/it] + +{'loss': 0.4187, 'learning_rate': 6.3012906213357316e-06, 'epoch': 0.63} + + 63%|██████▎ | 4663/7378 [16:00:03<9:15:43, 12.28s/it] + 63%|██████▎ | 4664/7378 [16:00:15<9:12:17, 12.21s/it] + +{'loss': 0.4334, 'learning_rate': 6.297212161828761e-06, 'epoch': 0.63} + + 63%|██████▎ | 4664/7378 [16:00:15<9:12:17, 12.21s/it] + 63%|██████▎ | 4665/7378 [16:00:28<9:19:15, 12.37s/it] + +{'loss': 0.4793, 'learning_rate': 6.2931344159761165e-06, 'epoch': 0.63} + + 63%|██████▎ | 4665/7378 [16:00:28<9:19:15, 12.37s/it] + 63%|██████▎ | 4666/7378 [16:00:40<9:17:08, 12.33s/it] + +{'loss': 0.373, 'learning_rate': 6.289057384563721e-06, 'epoch': 0.63} + + 63%|██████▎ | 4666/7378 [16:00:40<9:17:08, 12.33s/it] + 63%|██████▎ | 4667/7378 [16:00:53<9:15:25, 12.29s/it] + +{'loss': 0.489, 'learning_rate': 6.284981068377359e-06, 'epoch': 0.63} + + 63%|██████▎ | 4667/7378 [16:00:53<9:15:25, 12.29s/it] + 63%|██████▎ | 4668/7378 [16:01:05<9:11:58, 12.22s/it] + +{'loss': 0.4299, 'learning_rate': 6.280905468202674e-06, 'epoch': 0.63} + + 63%|██████▎ | 4668/7378 [16:01:05<9:11:58, 12.22s/it] + 63%|██████▎ | 4669/7378 [16:01:17<9:10:24, 12.19s/it] + +{'loss': 0.3885, 'learning_rate': 6.276830584825175e-06, 'epoch': 0.63} + + 63%|██████▎ | 4669/7378 [16:01:17<9:10:24, 12.19s/it] + 63%|██████▎ | 4670/7378 [16:01:29<9:11:48, 12.23s/it] + +{'loss': 0.4188, 'learning_rate': 6.272756419030235e-06, 'epoch': 0.63} + + 63%|██████▎ | 4670/7378 [16:01:29<9:11:48, 12.23s/it] + 63%|██████▎ | 4671/7378 [16:01:41<9:08:07, 12.15s/it] + +{'loss': 0.4105, 'learning_rate': 6.268682971603081e-06, 'epoch': 0.63} + + 63%|██████▎ | 4671/7378 [16:01:41<9:08:07, 12.15s/it] + 63%|██████▎ | 4672/7378 [16:01:54<9:13:23, 12.27s/it] + +{'loss': 0.4949, 'learning_rate': 6.264610243328808e-06, 'epoch': 0.63} + + 63%|██████▎ | 4672/7378 [16:01:54<9:13:23, 12.27s/it] + 63%|██████▎ | 4673/7378 [16:02:05<9:07:15, 12.14s/it] + +{'loss': 0.4785, 'learning_rate': 6.26053823499237e-06, 'epoch': 0.63} + + 63%|██████▎ | 4673/7378 [16:02:05<9:07:15, 12.14s/it] + 63%|██████▎ | 4674/7378 [16:02:18<9:07:16, 12.14s/it] + +{'loss': 0.4355, 'learning_rate': 6.256466947378586e-06, 'epoch': 0.63} + + 63%|██████▎ | 4674/7378 [16:02:18<9:07:16, 12.14s/it] + 63%|██████▎ | 4675/7378 [16:02:30<9:10:44, 12.23s/it] + +{'loss': 0.4353, 'learning_rate': 6.252396381272129e-06, 'epoch': 0.63} + + 63%|██████▎ | 4675/7378 [16:02:30<9:10:44, 12.23s/it] + 63%|██████▎ | 4676/7378 [16:02:42<9:14:05, 12.30s/it] + +{'loss': 0.4383, 'learning_rate': 6.248326537457538e-06, 'epoch': 0.63} + + 63%|██████▎ | 4676/7378 [16:02:42<9:14:05, 12.30s/it] + 63%|██████▎ | 4677/7378 [16:02:55<9:10:06, 12.22s/it] + +{'loss': 0.3767, 'learning_rate': 6.2442574167192125e-06, 'epoch': 0.63} + + 63%|██████▎ | 4677/7378 [16:02:55<9:10:06, 12.22s/it] + 63%|██████▎ | 4678/7378 [16:03:07<9:09:13, 12.20s/it] + +{'loss': 0.4823, 'learning_rate': 6.240189019841411e-06, 'epoch': 0.63} + + 63%|██████▎ | 4678/7378 [16:03:07<9:09:13, 12.20s/it] + 63%|██████▎ | 4679/7378 [16:03:19<9:13:14, 12.30s/it] + +{'loss': 0.4339, 'learning_rate': 6.2361213476082534e-06, 'epoch': 0.63} + + 63%|██████▎ | 4679/7378 [16:03:19<9:13:14, 12.30s/it] + 63%|██████▎ | 4680/7378 [16:03:32<9:16:11, 12.37s/it] + +{'loss': 0.4489, 'learning_rate': 6.232054400803719e-06, 'epoch': 0.63} + + 63%|██████▎ | 4680/7378 [16:03:32<9:16:11, 12.37s/it] + 63%|██████▎ | 4681/7378 [16:03:44<9:11:16, 12.26s/it] + +{'loss': 0.4639, 'learning_rate': 6.22798818021165e-06, 'epoch': 0.63} + + 63%|██████▎ | 4681/7378 [16:03:44<9:11:16, 12.26s/it] + 63%|██████▎ | 4682/7378 [16:03:56<9:06:32, 12.16s/it] + +{'loss': 0.4343, 'learning_rate': 6.223922686615743e-06, 'epoch': 0.63} + + 63%|██████▎ | 4682/7378 [16:03:56<9:06:32, 12.16s/it] + 63%|██████▎ | 4683/7378 [16:04:08<9:10:43, 12.26s/it] + +{'loss': 0.4696, 'learning_rate': 6.219857920799564e-06, 'epoch': 0.63} + + 63%|██████▎ | 4683/7378 [16:04:08<9:10:43, 12.26s/it] + 63%|██████▎ | 4684/7378 [16:04:20<9:07:06, 12.18s/it] + +{'loss': 0.5098, 'learning_rate': 6.215793883546526e-06, 'epoch': 0.63} + + 63%|██████▎ | 4684/7378 [16:04:20<9:07:06, 12.18s/it] + 63%|██████▎ | 4685/7378 [16:04:32<9:06:09, 12.17s/it] + +{'loss': 0.4467, 'learning_rate': 6.211730575639914e-06, 'epoch': 0.63} + + 63%|██████▎ | 4685/7378 [16:04:32<9:06:09, 12.17s/it] + 64%|██████▎ | 4686/7378 [16:04:45<9:08:02, 12.21s/it] + +{'loss': 0.4494, 'learning_rate': 6.207667997862866e-06, 'epoch': 0.64} + + 64%|██████▎ | 4686/7378 [16:04:45<9:08:02, 12.21s/it] + 64%|██████▎ | 4687/7378 [16:04:57<9:07:06, 12.20s/it] + +{'loss': 0.4188, 'learning_rate': 6.203606150998377e-06, 'epoch': 0.64} + + 64%|██████▎ | 4687/7378 [16:04:57<9:07:06, 12.20s/it] + 64%|██████▎ | 4688/7378 [16:05:09<9:11:51, 12.31s/it] + +{'loss': 0.5108, 'learning_rate': 6.1995450358293085e-06, 'epoch': 0.64} + + 64%|██████▎ | 4688/7378 [16:05:09<9:11:51, 12.31s/it] + 64%|██████▎ | 4689/7378 [16:05:21<9:07:54, 12.23s/it] + +{'loss': 0.4294, 'learning_rate': 6.195484653138372e-06, 'epoch': 0.64} + + 64%|██████▎ | 4689/7378 [16:05:21<9:07:54, 12.23s/it] + 64%|██████▎ | 4690/7378 [16:05:33<9:04:33, 12.16s/it] + +{'loss': 0.4342, 'learning_rate': 6.1914250037081465e-06, 'epoch': 0.64} + + 64%|██████▎ | 4690/7378 [16:05:33<9:04:33, 12.16s/it] + 64%|██████▎ | 4691/7378 [16:05:46<9:07:21, 12.22s/it] + +{'loss': 0.4637, 'learning_rate': 6.187366088321065e-06, 'epoch': 0.64} + + 64%|██████▎ | 4691/7378 [16:05:46<9:07:21, 12.22s/it] + 64%|██████▎ | 4692/7378 [16:05:58<9:09:35, 12.28s/it] + +{'loss': 0.4397, 'learning_rate': 6.1833079077594215e-06, 'epoch': 0.64} + + 64%|██████▎ | 4692/7378 [16:05:58<9:09:35, 12.28s/it] + 64%|██████▎ | 4693/7378 [16:06:10<9:08:43, 12.26s/it] + +{'loss': 0.4328, 'learning_rate': 6.179250462805362e-06, 'epoch': 0.64} + + 64%|██████▎ | 4693/7378 [16:06:10<9:08:43, 12.26s/it] + 64%|██████▎ | 4694/7378 [16:06:23<9:09:52, 12.29s/it] + +{'loss': 0.4972, 'learning_rate': 6.175193754240899e-06, 'epoch': 0.64} + + 64%|██████��� | 4694/7378 [16:06:23<9:09:52, 12.29s/it] + 64%|██████▎ | 4695/7378 [16:06:35<9:07:16, 12.24s/it] + +{'loss': 0.5075, 'learning_rate': 6.171137782847895e-06, 'epoch': 0.64} + + 64%|██████▎ | 4695/7378 [16:06:35<9:07:16, 12.24s/it] + 64%|██████▎ | 4696/7378 [16:06:47<9:09:46, 12.30s/it] + +{'loss': 0.4259, 'learning_rate': 6.1670825494080834e-06, 'epoch': 0.64} + + 64%|██████▎ | 4696/7378 [16:06:47<9:09:46, 12.30s/it] + 64%|██████▎ | 4697/7378 [16:07:00<9:19:10, 12.51s/it] + +{'loss': 0.3806, 'learning_rate': 6.163028054703041e-06, 'epoch': 0.64} + + 64%|██████▎ | 4697/7378 [16:07:00<9:19:10, 12.51s/it] + 64%|██████▎ | 4698/7378 [16:07:12<9:12:26, 12.37s/it] + +{'loss': 0.4316, 'learning_rate': 6.15897429951421e-06, 'epoch': 0.64} + + 64%|██████▎ | 4698/7378 [16:07:12<9:12:26, 12.37s/it] + 64%|██████▎ | 4699/7378 [16:07:25<9:14:35, 12.42s/it] + +{'loss': 0.4314, 'learning_rate': 6.154921284622886e-06, 'epoch': 0.64} + + 64%|██████▎ | 4699/7378 [16:07:25<9:14:35, 12.42s/it] + 64%|██████▎ | 4700/7378 [16:07:37<9:14:49, 12.43s/it] + +{'loss': 0.4479, 'learning_rate': 6.150869010810227e-06, 'epoch': 0.64} + + 64%|██████▎ | 4700/7378 [16:07:37<9:14:49, 12.43s/it] + 64%|██████▎ | 4701/7378 [16:07:50<9:13:34, 12.41s/it] + +{'loss': 0.4184, 'learning_rate': 6.146817478857241e-06, 'epoch': 0.64} + + 64%|██████▎ | 4701/7378 [16:07:50<9:13:34, 12.41s/it] + 64%|██████▎ | 4702/7378 [16:08:02<9:16:06, 12.47s/it] + +{'loss': 0.4221, 'learning_rate': 6.142766689544804e-06, 'epoch': 0.64} + + 64%|██████▎ | 4702/7378 [16:08:02<9:16:06, 12.47s/it] + 64%|██████▎ | 4703/7378 [16:08:15<9:13:03, 12.41s/it] + +{'loss': 0.4291, 'learning_rate': 6.138716643653634e-06, 'epoch': 0.64} + + 64%|██████▎ | 4703/7378 [16:08:15<9:13:03, 12.41s/it] + 64%|██████▍ | 4704/7378 [16:08:27<9:08:00, 12.30s/it] + +{'loss': 0.427, 'learning_rate': 6.134667341964321e-06, 'epoch': 0.64} + + 64%|██████▍ | 4704/7378 [16:08:27<9:08:00, 12.30s/it] + 64%|██████▍ | 4705/7378 [16:08:39<9:05:48, 12.25s/it] + +{'loss': 0.4034, 'learning_rate': 6.130618785257302e-06, 'epoch': 0.64} + + 64%|██████▍ | 4705/7378 [16:08:39<9:05:48, 12.25s/it] + 64%|██████▍ | 4706/7378 [16:08:51<9:04:54, 12.24s/it] + +{'loss': 0.4306, 'learning_rate': 6.12657097431287e-06, 'epoch': 0.64} + + 64%|██████▍ | 4706/7378 [16:08:51<9:04:54, 12.24s/it] + 64%|██████▍ | 4707/7378 [16:09:04<9:09:44, 12.35s/it] + +{'loss': 0.4779, 'learning_rate': 6.122523909911182e-06, 'epoch': 0.64} + + 64%|██████▍ | 4707/7378 [16:09:04<9:09:44, 12.35s/it] + 64%|██████▍ | 4708/7378 [16:09:16<9:09:42, 12.35s/it] + +{'loss': 0.5077, 'learning_rate': 6.11847759283224e-06, 'epoch': 0.64} + + 64%|██████▍ | 4708/7378 [16:09:16<9:09:42, 12.35s/it] + 64%|██████▍ | 4709/7378 [16:09:28<9:10:27, 12.37s/it] + +{'loss': 0.4158, 'learning_rate': 6.114432023855916e-06, 'epoch': 0.64} + + 64%|██████▍ | 4709/7378 [16:09:28<9:10:27, 12.37s/it] + 64%|██████▍ | 4710/7378 [16:09:40<9:06:09, 12.28s/it] + +{'loss': 0.4627, 'learning_rate': 6.1103872037619225e-06, 'epoch': 0.64} + + 64%|██████▍ | 4710/7378 [16:09:40<9:06:09, 12.28s/it] + 64%|██████▍ | 4711/7378 [16:09:52<9:02:30, 12.20s/it] + +{'loss': 0.4965, 'learning_rate': 6.106343133329841e-06, 'epoch': 0.64} + + 64%|██████▍ | 4711/7378 [16:09:52<9:02:30, 12.20s/it] + 64%|██████▍ | 4712/7378 [16:10:05<9:05:13, 12.27s/it] + +{'loss': 0.5428, 'learning_rate': 6.102299813339101e-06, 'epoch': 0.64} + + 64%|██████▍ | 4712/7378 [16:10:05<9:05:13, 12.27s/it] + 64%|██████▍ | 4713/7378 [16:10:17<9:03:39, 12.24s/it] + +{'loss': 0.4652, 'learning_rate': 6.098257244568986e-06, 'epoch': 0.64} + + 64%|██████▍ | 4713/7378 [16:10:17<9:03:39, 12.24s/it] + 64%|██████▍ | 4714/7378 [16:10:29<9:01:20, 12.19s/it] + +{'loss': 0.5102, 'learning_rate': 6.094215427798643e-06, 'epoch': 0.64} + + 64%|██████▍ | 4714/7378 [16:10:29<9:01:20, 12.19s/it] + 64%|██████▍ | 4715/7378 [16:10:41<8:59:50, 12.16s/it] + +{'loss': 0.4227, 'learning_rate': 6.090174363807063e-06, 'epoch': 0.64} + + 64%|██████▍ | 4715/7378 [16:10:41<8:59:50, 12.16s/it] + 64%|██████▍ | 4716/7378 [16:10:53<8:58:27, 12.14s/it] + +{'loss': 0.5286, 'learning_rate': 6.086134053373103e-06, 'epoch': 0.64} + + 64%|██████▍ | 4716/7378 [16:10:53<8:58:27, 12.14s/it] + 64%|██████▍ | 4717/7378 [16:11:06<9:00:29, 12.19s/it] + +{'loss': 0.4328, 'learning_rate': 6.082094497275466e-06, 'epoch': 0.64} + + 64%|██████▍ | 4717/7378 [16:11:06<9:00:29, 12.19s/it] + 64%|██████▍ | 4718/7378 [16:11:18<9:03:36, 12.26s/it] + +{'loss': 0.5108, 'learning_rate': 6.078055696292715e-06, 'epoch': 0.64} + + 64%|██████▍ | 4718/7378 [16:11:18<9:03:36, 12.26s/it] + 64%|██████▍ | 4719/7378 [16:11:30<9:02:15, 12.24s/it] + +{'loss': 0.4473, 'learning_rate': 6.074017651203265e-06, 'epoch': 0.64} + + 64%|██████▍ | 4719/7378 [16:11:30<9:02:15, 12.24s/it] + 64%|██████▍ | 4720/7378 [16:11:43<9:02:55, 12.26s/it] + +{'loss': 0.4208, 'learning_rate': 6.069980362785386e-06, 'epoch': 0.64} + + 64%|██████▍ | 4720/7378 [16:11:43<9:02:55, 12.26s/it] + 64%|██████▍ | 4721/7378 [16:11:55<9:02:27, 12.25s/it] + +{'loss': 0.4593, 'learning_rate': 6.065943831817202e-06, 'epoch': 0.64} + + 64%|██████▍ | 4721/7378 [16:11:55<9:02:27, 12.25s/it] + 64%|██████▍ | 4722/7378 [16:12:07<9:03:18, 12.27s/it] + +{'loss': 0.488, 'learning_rate': 6.061908059076691e-06, 'epoch': 0.64} + + 64%|██████▍ | 4722/7378 [16:12:07<9:03:18, 12.27s/it] + 64%|██████▍ | 4723/7378 [16:12:19<9:02:38, 12.26s/it] + +{'loss': 0.3907, 'learning_rate': 6.057873045341686e-06, 'epoch': 0.64} + + 64%|██████▍ | 4723/7378 [16:12:19<9:02:38, 12.26s/it] + 64%|██████▍ | 4724/7378 [16:12:31<9:00:27, 12.22s/it] + +{'loss': 0.4383, 'learning_rate': 6.05383879138987e-06, 'epoch': 0.64} + + 64%|██████▍ | 4724/7378 [16:12:31<9:00:27, 12.22s/it] + 64%|██████▍ | 4725/7378 [16:12:44<9:00:13, 12.22s/it] + +{'loss': 0.4722, 'learning_rate': 6.049805297998785e-06, 'epoch': 0.64} + + 64%|██████▍ | 4725/7378 [16:12:44<9:00:13, 12.22s/it] + 64%|██████▍ | 4726/7378 [16:12:56<9:03:56, 12.31s/it] + +{'loss': 0.4097, 'learning_rate': 6.04577256594582e-06, 'epoch': 0.64} + + 64%|██████▍ | 4726/7378 [16:12:56<9:03:56, 12.31s/it] + 64%|██████▍ | 4727/7378 [16:13:09<9:06:02, 12.36s/it] + +{'loss': 0.4559, 'learning_rate': 6.041740596008228e-06, 'epoch': 0.64} + + 64%|██████▍ | 4727/7378 [16:13:09<9:06:02, 12.36s/it] + 64%|██████▍ | 4728/7378 [16:13:21<9:03:28, 12.31s/it] + +{'loss': 0.4407, 'learning_rate': 6.0377093889631e-06, 'epoch': 0.64} + + 64%|██████▍ | 4728/7378 [16:13:21<9:03:28, 12.31s/it] + 64%|██████▍ | 4729/7378 [16:13:33<9:00:01, 12.23s/it] + +{'loss': 0.4593, 'learning_rate': 6.033678945587393e-06, 'epoch': 0.64} + + 64%|██████▍ | 4729/7378 [16:13:33<9:00:01, 12.23s/it] + 64%|██████▍ | 4730/7378 [16:13:45<8:59:13, 12.22s/it] + +{'loss': 0.4155, 'learning_rate': 6.029649266657911e-06, 'epoch': 0.64} + + 64%|██████▍ | 4730/7378 [16:13:45<8:59:13, 12.22s/it] + 64%|██████▍ | 4731/7378 [16:13:57<8:58:12, 12.20s/it] + +{'loss': 0.4862, 'learning_rate': 6.025620352951308e-06, 'epoch': 0.64} + + 64%|██████▍ | 4731/7378 [16:13:57<8:58:12, 12.20s/it] + 64%|██████▍ | 4732/7378 [16:14:09<8:57:34, 12.19s/it] + +{'loss': 0.4037, 'learning_rate': 6.0215922052441e-06, 'epoch': 0.64} + + 64%|██████▍ | 4732/7378 [16:14:09<8:57:34, 12.19s/it] + 64%|██████▍ | 4733/7378 [16:14:22<9:00:20, 12.26s/it] + +{'loss': 0.3881, 'learning_rate': 6.0175648243126425e-06, 'epoch': 0.64} + + 64%|██████▍ | 4733/7378 [16:14:22<9:00:20, 12.26s/it] + 64%|██████▍ | 4734/7378 [16:14:34<9:04:59, 12.37s/it] + +{'loss': 0.5348, 'learning_rate': 6.013538210933156e-06, 'epoch': 0.64} + + 64%|██████▍ | 4734/7378 [16:14:34<9:04:59, 12.37s/it] + 64%|██████▍ | 4735/7378 [16:14:47<9:02:12, 12.31s/it] + +{'loss': 0.4659, 'learning_rate': 6.009512365881703e-06, 'epoch': 0.64} + + 64%|██████▍ | 4735/7378 [16:14:47<9:02:12, 12.31s/it] + 64%|██████▍ | 4736/7378 [16:14:59<9:03:41, 12.35s/it] + +{'loss': 0.4327, 'learning_rate': 6.0054872899342065e-06, 'epoch': 0.64} + + 64%|██████▍ | 4736/7378 [16:14:59<9:03:41, 12.35s/it] + 64%|██████▍ | 4737/7378 [16:15:11<9:03:58, 12.36s/it] + +{'loss': 0.4653, 'learning_rate': 6.001462983866433e-06, 'epoch': 0.64} + + 64%|██████▍ | 4737/7378 [16:15:11<9:03:58, 12.36s/it] + 64%|██████▍ | 4738/7378 [16:15:24<9:00:16, 12.28s/it] + +{'loss': 0.4113, 'learning_rate': 5.997439448454004e-06, 'epoch': 0.64} + + 64%|██████▍ | 4738/7378 [16:15:24<9:00:16, 12.28s/it] + 64%|██████▍ | 4739/7378 [16:15:36<8:58:25, 12.24s/it] + +{'loss': 0.4764, 'learning_rate': 5.993416684472393e-06, 'epoch': 0.64} + + 64%|██████▍ | 4739/7378 [16:15:36<8:58:25, 12.24s/it] + 64%|██████▍ | 4740/7378 [16:15:48<8:56:18, 12.20s/it] + +{'loss': 0.5087, 'learning_rate': 5.989394692696928e-06, 'epoch': 0.64} + + 64%|██████▍ | 4740/7378 [16:15:48<8:56:18, 12.20s/it] + 64%|██████▍ | 4741/7378 [16:16:00<9:00:32, 12.30s/it] + +{'loss': 0.5103, 'learning_rate': 5.985373473902784e-06, 'epoch': 0.64} + + 64%|██████▍ | 4741/7378 [16:16:00<9:00:32, 12.30s/it] + 64%|██████▍ | 4742/7378 [16:16:13<9:04:32, 12.39s/it] + +{'loss': 0.4042, 'learning_rate': 5.981353028864987e-06, 'epoch': 0.64} + + 64%|██████▍ | 4742/7378 [16:16:13<9:04:32, 12.39s/it] + 64%|██████▍ | 4743/7378 [16:16:25<9:06:09, 12.44s/it] + +{'loss': 0.531, 'learning_rate': 5.977333358358412e-06, 'epoch': 0.64} + + 64%|██████▍ | 4743/7378 [16:16:25<9:06:09, 12.44s/it] + 64%|██████▍ | 4744/7378 [16:16:38<9:06:54, 12.46s/it] + +{'loss': 0.4847, 'learning_rate': 5.9733144631577935e-06, 'epoch': 0.64} + + 64%|██████▍ | 4744/7378 [16:16:38<9:06:54, 12.46s/it] + 64%|██████▍ | 4745/7378 [16:16:50<9:02:43, 12.37s/it] + +{'loss': 0.4929, 'learning_rate': 5.969296344037705e-06, 'epoch': 0.64} + + 64%|██████▍ | 4745/7378 [16:16:50<9:02:43, 12.37s/it] + 64%|██████▍ | 4746/7378 [16:17:02<9:02:03, 12.36s/it] + +{'loss': 0.4748, 'learning_rate': 5.96527900177258e-06, 'epoch': 0.64} + + 64%|██████▍ | 4746/7378 [16:17:02<9:02:03, 12.36s/it] + 64%|██████▍ | 4747/7378 [16:17:15<9:01:04, 12.34s/it] + +{'loss': 0.4159, 'learning_rate': 5.961262437136697e-06, 'epoch': 0.64} + + 64%|██████▍ | 4747/7378 [16:17:15<9:01:04, 12.34s/it] + 64%|██████▍ | 4748/7378 [16:17:27<8:59:26, 12.31s/it] + +{'loss': 0.4229, 'learning_rate': 5.957246650904183e-06, 'epoch': 0.64} + + 64%|██████▍ | 4748/7378 [16:17:27<8:59:26, 12.31s/it] + 64%|██████▍ | 4749/7378 [16:17:39<8:59:44, 12.32s/it] + +{'loss': 0.428, 'learning_rate': 5.953231643849022e-06, 'epoch': 0.64} + + 64%|██████▍ | 4749/7378 [16:17:39<8:59:44, 12.32s/it] + 64%|██████▍ | 4750/7378 [16:17:51<8:57:24, 12.27s/it] + +{'loss': 0.4223, 'learning_rate': 5.949217416745041e-06, 'epoch': 0.64} + + 64%|██████▍ | 4750/7378 [16:17:51<8:57:24, 12.27s/it] + 64%|██████▍ | 4751/7378 [16:18:04<8:55:47, 12.24s/it] + +{'loss': 0.4268, 'learning_rate': 5.945203970365922e-06, 'epoch': 0.64} + + 64%|██████▍ | 4751/7378 [16:18:04<8:55:47, 12.24s/it] + 64%|██████▍ | 4752/7378 [16:18:16<9:00:28, 12.35s/it] + +{'loss': 0.4614, 'learning_rate': 5.941191305485189e-06, 'epoch': 0.64} + + 64%|██████▍ | 4752/7378 [16:18:16<9:00:28, 12.35s/it] + 64%|██████▍ | 4753/7378 [16:18:29<9:06:05, 12.48s/it] + +{'loss': 0.4704, 'learning_rate': 5.937179422876226e-06, 'epoch': 0.64} + + 64%|██████▍ | 4753/7378 [16:18:29<9:06:05, 12.48s/it] + 64%|██████▍ | 4754/7378 [16:18:41<9:04:48, 12.46s/it] + +{'loss': 0.4425, 'learning_rate': 5.933168323312256e-06, 'epoch': 0.64} + + 64%|██████▍ | 4754/7378 [16:18:41<9:04:48, 12.46s/it] + 64%|██████▍ | 4755/7378 [16:18:54<9:04:20, 12.45s/it] + +{'loss': 0.4758, 'learning_rate': 5.92915800756636e-06, 'epoch': 0.64} + + 64%|██████▍ | 4755/7378 [16:18:54<9:04:20, 12.45s/it] + 64%|██████▍ | 4756/7378 [16:19:06<9:04:04, 12.45s/it] + +{'loss': 0.42, 'learning_rate': 5.92514847641146e-06, 'epoch': 0.64} + + 64%|██████▍ | 4756/7378 [16:19:06<9:04:04, 12.45s/it] + 64%|██████▍ | 4757/7378 [16:19:19<9:02:22, 12.42s/it] + +{'loss': 0.4037, 'learning_rate': 5.921139730620331e-06, 'epoch': 0.64} + + 64%|██████▍ | 4757/7378 [16:19:19<9:02:22, 12.42s/it] + 64%|██████▍ | 4758/7378 [16:19:31<8:58:36, 12.33s/it] + +{'loss': 0.4706, 'learning_rate': 5.917131770965596e-06, 'epoch': 0.64} + + 64%|██████▍ | 4758/7378 [16:19:31<8:58:36, 12.33s/it] + 65%|██████▍ | 4759/7378 [16:19:43<9:00:40, 12.39s/it] + +{'loss': 0.4183, 'learning_rate': 5.913124598219726e-06, 'epoch': 0.65} + + 65%|██████▍ | 4759/7378 [16:19:43<9:00:40, 12.39s/it] + 65%|██████▍ | 4760/7378 [16:19:56<9:04:51, 12.49s/it] + +{'loss': 0.4805, 'learning_rate': 5.909118213155044e-06, 'epoch': 0.65} + + 65%|██████▍ | 4760/7378 [16:19:56<9:04:51, 12.49s/it] + 65%|██████▍ | 4761/7378 [16:20:08<9:01:00, 12.40s/it] + +{'loss': 0.4215, 'learning_rate': 5.9051126165437134e-06, 'epoch': 0.65} + + 65%|██████▍ | 4761/7378 [16:20:08<9:01:00, 12.40s/it] + 65%|██████▍ | 4762/7378 [16:20:21<9:00:44, 12.40s/it] + +{'loss': 0.4506, 'learning_rate': 5.901107809157753e-06, 'epoch': 0.65} + + 65%|██████▍ | 4762/7378 [16:20:21<9:00:44, 12.40s/it] + 65%|██████▍ | 4763/7378 [16:20:33<9:00:41, 12.41s/it] + +{'loss': 0.4381, 'learning_rate': 5.897103791769024e-06, 'epoch': 0.65} + + 65%|██████▍ | 4763/7378 [16:20:33<9:00:41, 12.41s/it] + 65%|██████▍ | 4764/7378 [16:20:46<9:03:45, 12.48s/it] + +{'loss': 0.4718, 'learning_rate': 5.893100565149243e-06, 'epoch': 0.65} + + 65%|██████▍ | 4764/7378 [16:20:46<9:03:45, 12.48s/it] + 65%|██████▍ | 4765/7378 [16:20:58<9:01:18, 12.43s/it] + +{'loss': 0.4785, 'learning_rate': 5.889098130069965e-06, 'epoch': 0.65} + + 65%|██████▍ | 4765/7378 [16:20:58<9:01:18, 12.43s/it] + 65%|██████▍ | 4766/7378 [16:21:10<9:00:35, 12.42s/it] + +{'loss': 0.4787, 'learning_rate': 5.885096487302595e-06, 'epoch': 0.65} + + 65%|██████▍ | 4766/7378 [16:21:10<9:00:35, 12.42s/it] + 65%|██████▍ | 4767/7378 [16:21:23<8:56:30, 12.33s/it] + +{'loss': 0.4722, 'learning_rate': 5.881095637618392e-06, 'epoch': 0.65} + + 65%|██████▍ | 4767/7378 [16:21:23<8:56:30, 12.33s/it] + 65%|██████▍ | 4768/7378 [16:21:35<8:57:52, 12.36s/it] + +{'loss': 0.4585, 'learning_rate': 5.877095581788454e-06, 'epoch': 0.65} + + 65%|██████▍ | 4768/7378 [16:21:35<8:57:52, 12.36s/it] + 65%|██████▍ | 4769/7378 [16:21:48<9:00:20, 12.43s/it] + +{'loss': 0.4132, 'learning_rate': 5.87309632058373e-06, 'epoch': 0.65} + + 65%|██████▍ | 4769/7378 [16:21:48<9:00:20, 12.43s/it] + 65%|██████▍ | 4770/7378 [16:22:00<8:56:31, 12.34s/it] + +{'loss': 0.4273, 'learning_rate': 5.8690978547750134e-06, 'epoch': 0.65} + + 65%|██████▍ | 4770/7378 [16:22:00<8:56:31, 12.34s/it] + 65%|██████▍ | 4771/7378 [16:22:12<8:57:07, 12.36s/it] + +{'loss': 0.5183, 'learning_rate': 5.865100185132948e-06, 'epoch': 0.65} + + 65%|██████▍ | 4771/7378 [16:22:12<8:57:07, 12.36s/it] + 65%|██████▍ | 4772/7378 [16:22:25<8:59:42, 12.43s/it] + +{'loss': 0.4612, 'learning_rate': 5.8611033124280225e-06, 'epoch': 0.65} + + 65%|██████▍ | 4772/7378 [16:22:25<8:59:42, 12.43s/it] + 65%|██████▍ | 4773/7378 [16:22:37<9:01:18, 12.47s/it] + +{'loss': 0.4244, 'learning_rate': 5.857107237430567e-06, 'epoch': 0.65} + + 65%|██████▍ | 4773/7378 [16:22:37<9:01:18, 12.47s/it] + 65%|██████▍ | 4774/7378 [16:22:50<8:59:57, 12.44s/it] + +{'loss': 0.4833, 'learning_rate': 5.853111960910768e-06, 'epoch': 0.65} + + 65%|██████▍ | 4774/7378 [16:22:50<8:59:57, 12.44s/it] + 65%|██████▍ | 4775/7378 [16:23:02<8:58:42, 12.42s/it] + +{'loss': 0.4493, 'learning_rate': 5.849117483638648e-06, 'epoch': 0.65} + + 65%|██████▍ | 4775/7378 [16:23:02<8:58:42, 12.42s/it] + 65%|██████▍ | 4776/7378 [16:23:14<8:58:09, 12.41s/it] + +{'loss': 0.4575, 'learning_rate': 5.845123806384083e-06, 'epoch': 0.65} + + 65%|██████▍ | 4776/7378 [16:23:14<8:58:09, 12.41s/it] + 65%|██████▍ | 4777/7378 [16:23:27<8:55:53, 12.36s/it] + +{'loss': 0.4953, 'learning_rate': 5.841130929916788e-06, 'epoch': 0.65} + + 65%|██████▍ | 4777/7378 [16:23:27<8:55:53, 12.36s/it] + 65%|██████▍ | 4778/7378 [16:23:39<8:56:25, 12.38s/it] + +{'loss': 0.4101, 'learning_rate': 5.83713885500633e-06, 'epoch': 0.65} + + 65%|██████▍ | 4778/7378 [16:23:39<8:56:25, 12.38s/it] + 65%|██████▍ | 4779/7378 [16:23:51<8:52:16, 12.29s/it] + +{'loss': 0.4796, 'learning_rate': 5.8331475824221215e-06, 'epoch': 0.65} + + 65%|██████▍ | 4779/7378 [16:23:51<8:52:16, 12.29s/it] + 65%|██████▍ | 4780/7378 [16:24:03<8:48:50, 12.21s/it] + +{'loss': 0.4131, 'learning_rate': 5.8291571129334145e-06, 'epoch': 0.65} + + 65%|██████▍ | 4780/7378 [16:24:03<8:48:50, 12.21s/it] + 65%|██████▍ | 4781/7378 [16:24:15<8:43:28, 12.09s/it] + +{'loss': 0.417, 'learning_rate': 5.82516744730931e-06, 'epoch': 0.65} + + 65%|██████▍ | 4781/7378 [16:24:15<8:43:28, 12.09s/it] + 65%|██████▍ | 4782/7378 [16:24:27<8:43:12, 12.09s/it] + +{'loss': 0.4537, 'learning_rate': 5.821178586318747e-06, 'epoch': 0.65} + + 65%|██████▍ | 4782/7378 [16:24:27<8:43:12, 12.09s/it] + 65%|██████▍ | 4783/7378 [16:24:39<8:45:45, 12.16s/it] + +{'loss': 0.4494, 'learning_rate': 5.81719053073053e-06, 'epoch': 0.65} + + 65%|██████▍ | 4783/7378 [16:24:39<8:45:45, 12.16s/it] + 65%|██████▍ | 4784/7378 [16:24:52<8:46:47, 12.18s/it] + +{'loss': 0.4604, 'learning_rate': 5.81320328131328e-06, 'epoch': 0.65} + + 65%|██████▍ | 4784/7378 [16:24:52<8:46:47, 12.18s/it] + 65%|██████▍ | 4785/7378 [16:25:04<8:50:52, 12.28s/it] + +{'loss': 0.4692, 'learning_rate': 5.8092168388354876e-06, 'epoch': 0.65} + + 65%|██████▍ | 4785/7378 [16:25:04<8:50:52, 12.28s/it] + 65%|██████▍ | 4786/7378 [16:25:16<8:50:07, 12.27s/it] + +{'loss': 0.5018, 'learning_rate': 5.805231204065473e-06, 'epoch': 0.65} + + 65%|██████▍ | 4786/7378 [16:25:16<8:50:07, 12.27s/it] + 65%|██████▍ | 4787/7378 [16:25:29<8:50:03, 12.27s/it] + +{'loss': 0.4504, 'learning_rate': 5.801246377771406e-06, 'epoch': 0.65} + + 65%|██████▍ | 4787/7378 [16:25:29<8:50:03, 12.27s/it] + 65%|██████▍ | 4788/7378 [16:25:41<8:51:00, 12.30s/it] + +{'loss': 0.4067, 'learning_rate': 5.797262360721292e-06, 'epoch': 0.65} + + 65%|██████▍ | 4788/7378 [16:25:41<8:51:00, 12.30s/it] + 65%|██████▍ | 4789/7378 [16:25:53<8:46:59, 12.21s/it] + +{'loss': 0.4058, 'learning_rate': 5.793279153682999e-06, 'epoch': 0.65} + + 65%|██████▍ | 4789/7378 [16:25:53<8:46:59, 12.21s/it] + 65%|██████▍ | 4790/7378 [16:26:05<8:47:17, 12.22s/it] + +{'loss': 0.4181, 'learning_rate': 5.7892967574242235e-06, 'epoch': 0.65} + + 65%|██████▍ | 4790/7378 [16:26:05<8:47:17, 12.22s/it] + 65%|██████▍ | 4791/7378 [16:26:18<8:49:24, 12.28s/it] + +{'loss': 0.4659, 'learning_rate': 5.785315172712507e-06, 'epoch': 0.65} + + 65%|██████▍ | 4791/7378 [16:26:18<8:49:24, 12.28s/it] + 65%|██████▍ | 4792/7378 [16:26:30<8:53:35, 12.38s/it] + +{'loss': 0.4944, 'learning_rate': 5.781334400315241e-06, 'epoch': 0.65} + + 65%|██████▍ | 4792/7378 [16:26:30<8:53:35, 12.38s/it] + 65%|██████▍ | 4793/7378 [16:26:42<8:50:03, 12.30s/it] + +{'loss': 0.4394, 'learning_rate': 5.777354440999652e-06, 'epoch': 0.65} + + 65%|██████▍ | 4793/7378 [16:26:42<8:50:03, 12.30s/it] + 65%|██████▍ | 4794/7378 [16:26:55<8:51:11, 12.33s/it] + +{'loss': 0.3908, 'learning_rate': 5.773375295532821e-06, 'epoch': 0.65} + + 65%|██████▍ | 4794/7378 [16:26:55<8:51:11, 12.33s/it] + 65%|██████▍ | 4795/7378 [16:27:07<8:53:05, 12.38s/it] + +{'loss': 0.3761, 'learning_rate': 5.7693969646816665e-06, 'epoch': 0.65} + + 65%|██████▍ | 4795/7378 [16:27:07<8:53:05, 12.38s/it] + 65%|██████▌ | 4796/7378 [16:27:19<8:48:01, 12.27s/it] + +{'loss': 0.3766, 'learning_rate': 5.765419449212944e-06, 'epoch': 0.65} + + 65%|██████▌ | 4796/7378 [16:27:19<8:48:01, 12.27s/it] + 65%|██████▌ | 4797/7378 [16:27:32<8:50:40, 12.34s/it] + +{'loss': 0.4197, 'learning_rate': 5.761442749893256e-06, 'epoch': 0.65} + + 65%|██████▌ | 4797/7378 [16:27:32<8:50:40, 12.34s/it] + 65%|██████▌ | 4798/7378 [16:27:44<8:49:38, 12.32s/it] + +{'loss': 0.5014, 'learning_rate': 5.757466867489056e-06, 'epoch': 0.65} + + 65%|██████▌ | 4798/7378 [16:27:44<8:49:38, 12.32s/it] + 65%|██████▌ | 4799/7378 [16:27:56<8:48:21, 12.29s/it] + +{'loss': 0.5004, 'learning_rate': 5.753491802766631e-06, 'epoch': 0.65} + + 65%|██████▌ | 4799/7378 [16:27:56<8:48:21, 12.29s/it] + 65%|██████▌ | 4800/7378 [16:28:08<8:42:28, 12.16s/it] + +{'loss': 0.406, 'learning_rate': 5.74951755649211e-06, 'epoch': 0.65} + + 65%|██████▌ | 4800/7378 [16:28:08<8:42:28, 12.16s/it] + 65%|██████▌ | 4801/7378 [16:28:21<8:48:03, 12.29s/it] + +{'loss': 0.4619, 'learning_rate': 5.745544129431467e-06, 'epoch': 0.65} + + 65%|██████▌ | 4801/7378 [16:28:21<8:48:03, 12.29s/it] + 65%|██████▌ | 4802/7378 [16:28:33<8:45:18, 12.24s/it] + +{'loss': 0.4847, 'learning_rate': 5.741571522350515e-06, 'epoch': 0.65} + + 65%|██████▌ | 4802/7378 [16:28:33<8:45:18, 12.24s/it] + 65%|██████▌ | 4803/7378 [16:28:45<8:39:19, 12.10s/it] + +{'loss': 0.4587, 'learning_rate': 5.73759973601492e-06, 'epoch': 0.65} + + 65%|██████▌ | 4803/7378 [16:28:45<8:39:19, 12.10s/it] + 65%|██████▌ | 4804/7378 [16:28:57<8:42:57, 12.19s/it] + +{'loss': 0.433, 'learning_rate': 5.7336287711901774e-06, 'epoch': 0.65} + + 65%|██████▌ | 4804/7378 [16:28:57<8:42:57, 12.19s/it] + 65%|██████▌ | 4805/7378 [16:29:09<8:43:45, 12.21s/it] + +{'loss': 0.4704, 'learning_rate': 5.729658628641628e-06, 'epoch': 0.65} + + 65%|██████▌ | 4805/7378 [16:29:09<8:43:45, 12.21s/it] + 65%|██████▌ | 4806/7378 [16:29:22<8:50:22, 12.37s/it] + +{'loss': 0.53, 'learning_rate': 5.725689309134448e-06, 'epoch': 0.65} + + 65%|██████▌ | 4806/7378 [16:29:22<8:50:22, 12.37s/it] + 65%|██████▌ | 4807/7378 [16:29:35<8:57:29, 12.54s/it] + +{'loss': 0.4609, 'learning_rate': 5.721720813433673e-06, 'epoch': 0.65} + + 65%|██████▌ | 4807/7378 [16:29:35<8:57:29, 12.54s/it] + 65%|██████▌ | 4808/7378 [16:29:47<8:54:04, 12.47s/it] + +{'loss': 0.4633, 'learning_rate': 5.7177531423041655e-06, 'epoch': 0.65} + + 65%|██████▌ | 4808/7378 [16:29:47<8:54:04, 12.47s/it] + 65%|██████▌ | 4809/7378 [16:30:00<9:00:54, 12.63s/it] + +{'loss': 0.4375, 'learning_rate': 5.7137862965106275e-06, 'epoch': 0.65} + + 65%|██████▌ | 4809/7378 [16:30:00<9:00:54, 12.63s/it] + 65%|██████▌ | 4810/7378 [16:30:13<8:59:04, 12.60s/it] + +{'loss': 0.4406, 'learning_rate': 5.709820276817609e-06, 'epoch': 0.65} + + 65%|██████▌ | 4810/7378 [16:30:13<8:59:04, 12.60s/it] + 65%|██████▌ | 4811/7378 [16:30:25<8:57:31, 12.56s/it] + +{'loss': 0.4989, 'learning_rate': 5.705855083989493e-06, 'epoch': 0.65} + + 65%|██████▌ | 4811/7378 [16:30:25<8:57:31, 12.56s/it] + 65%|██████▌ | 4812/7378 [16:30:38<8:58:54, 12.60s/it] + +{'loss': 0.4957, 'learning_rate': 5.701890718790519e-06, 'epoch': 0.65} + + 65%|██████▌ | 4812/7378 [16:30:38<8:58:54, 12.60s/it] + 65%|██████▌ | 4813/7378 [16:30:50<8:54:30, 12.50s/it] + +{'loss': 0.4029, 'learning_rate': 5.697927181984749e-06, 'epoch': 0.65} + + 65%|██████▌ | 4813/7378 [16:30:50<8:54:30, 12.50s/it] + 65%|██████▌ | 4814/7378 [16:31:03<8:51:47, 12.44s/it] + +{'loss': 0.4651, 'learning_rate': 5.693964474336093e-06, 'epoch': 0.65} + + 65%|██████▌ | 4814/7378 [16:31:03<8:51:47, 12.44s/it] + 65%|██████▌ | 4815/7378 [16:31:15<8:49:32, 12.40s/it] + +{'loss': 0.3847, 'learning_rate': 5.690002596608304e-06, 'epoch': 0.65} + + 65%|██████▌ | 4815/7378 [16:31:15<8:49:32, 12.40s/it] + 65%|██████▌ | 4816/7378 [16:31:27<8:50:35, 12.43s/it] + +{'loss': 0.441, 'learning_rate': 5.686041549564964e-06, 'epoch': 0.65} + + 65%|██████▌ | 4816/7378 [16:31:27<8:50:35, 12.43s/it] + 65%|██████▌ | 4817/7378 [16:31:39<8:44:27, 12.29s/it] + +{'loss': 0.5145, 'learning_rate': 5.682081333969513e-06, 'epoch': 0.65} + + 65%|██████▌ | 4817/7378 [16:31:39<8:44:27, 12.29s/it] + 65%|██████▌ | 4818/7378 [16:31:52<8:49:33, 12.41s/it] + +{'loss': 0.4764, 'learning_rate': 5.678121950585216e-06, 'epoch': 0.65} + + 65%|██████▌ | 4818/7378 [16:31:52<8:49:33, 12.41s/it] + 65%|██████▌ | 4819/7378 [16:32:04<8:48:24, 12.39s/it] + +{'loss': 0.4052, 'learning_rate': 5.674163400175181e-06, 'epoch': 0.65} + + 65%|██████▌ | 4819/7378 [16:32:04<8:48:24, 12.39s/it] + 65%|██████▌ | 4820/7378 [16:32:17<8:46:23, 12.35s/it] + +{'loss': 0.4984, 'learning_rate': 5.670205683502353e-06, 'epoch': 0.65} + + 65%|██████▌ | 4820/7378 [16:32:17<8:46:23, 12.35s/it] + 65%|██████▌ | 4821/7378 [16:32:29<8:45:17, 12.33s/it] + +{'loss': 0.4355, 'learning_rate': 5.66624880132953e-06, 'epoch': 0.65} + + 65%|██████▌ | 4821/7378 [16:32:29<8:45:17, 12.33s/it] + 65%|██████▌ | 4822/7378 [16:32:41<8:44:50, 12.32s/it] + +{'loss': 0.4194, 'learning_rate': 5.662292754419332e-06, 'epoch': 0.65} + + 65%|██████▌ | 4822/7378 [16:32:41<8:44:50, 12.32s/it] + 65%|██████▌ | 4823/7378 [16:32:54<8:47:02, 12.38s/it] + +{'loss': 0.4158, 'learning_rate': 5.658337543534227e-06, 'epoch': 0.65} + + 65%|██████▌ | 4823/7378 [16:32:54<8:47:02, 12.38s/it] + 65%|██████▌ | 4824/7378 [16:33:06<8:46:31, 12.37s/it] + +{'loss': 0.4605, 'learning_rate': 5.654383169436519e-06, 'epoch': 0.65} + + 65%|██████▌ | 4824/7378 [16:33:06<8:46:31, 12.37s/it] + 65%|██████▌ | 4825/7378 [16:33:18<8:43:49, 12.31s/it] + +{'loss': 0.5356, 'learning_rate': 5.650429632888348e-06, 'epoch': 0.65} + + 65%|██████▌ | 4825/7378 [16:33:18<8:43:49, 12.31s/it] + 65%|██████▌ | 4826/7378 [16:33:30<8:41:45, 12.27s/it] + +{'loss': 0.4076, 'learning_rate': 5.646476934651699e-06, 'epoch': 0.65} + + 65%|██████▌ | 4826/7378 [16:33:30<8:41:45, 12.27s/it] + 65%|██████▌ | 4827/7378 [16:33:43<8:43:23, 12.31s/it] + +{'loss': 0.4106, 'learning_rate': 5.6425250754883985e-06, 'epoch': 0.65} + + 65%|██████▌ | 4827/7378 [16:33:43<8:43:23, 12.31s/it] + 65%|██████▌ | 4828/7378 [16:33:55<8:46:06, 12.38s/it] + +{'loss': 0.4635, 'learning_rate': 5.638574056160102e-06, 'epoch': 0.65} + + 65%|██████▌ | 4828/7378 [16:33:55<8:46:06, 12.38s/it] + 65%|██████▌ | 4829/7378 [16:34:08<8:42:36, 12.30s/it] + +{'loss': 0.4534, 'learning_rate': 5.634623877428303e-06, 'epoch': 0.65} + + 65%|██████▌ | 4829/7378 [16:34:08<8:42:36, 12.30s/it] + 65%|██████▌ | 4830/7378 [16:34:20<8:43:58, 12.34s/it] + +{'loss': 0.4235, 'learning_rate': 5.630674540054337e-06, 'epoch': 0.65} + + 65%|██████▌ | 4830/7378 [16:34:20<8:43:58, 12.34s/it] + 65%|██████▌ | 4831/7378 [16:34:33<8:46:57, 12.41s/it] + +{'loss': 0.4589, 'learning_rate': 5.626726044799381e-06, 'epoch': 0.65} + + 65%|██████▌ | 4831/7378 [16:34:33<8:46:57, 12.41s/it] + 65%|██████▌ | 4832/7378 [16:34:45<8:44:21, 12.36s/it] + +{'loss': 0.437, 'learning_rate': 5.622778392424444e-06, 'epoch': 0.65} + + 65%|██████▌ | 4832/7378 [16:34:45<8:44:21, 12.36s/it] + 66%|██████▌ | 4833/7378 [16:34:57<8:48:18, 12.46s/it] + +{'loss': 0.4906, 'learning_rate': 5.6188315836903736e-06, 'epoch': 0.66} + + 66%|██████▌ | 4833/7378 [16:34:57<8:48:18, 12.46s/it] + 66%|██████▌ | 4834/7378 [16:35:10<8:43:14, 12.34s/it] + +{'loss': 0.4353, 'learning_rate': 5.614885619357855e-06, 'epoch': 0.66} + + 66%|██████▌ | 4834/7378 [16:35:10<8:43:14, 12.34s/it] + 66%|██████▌ | 4835/7378 [16:35:22<8:39:08, 12.25s/it] + +{'loss': 0.4003, 'learning_rate': 5.610940500187406e-06, 'epoch': 0.66} + + 66%|██████▌ | 4835/7378 [16:35:22<8:39:08, 12.25s/it] + 66%|██████▌ | 4836/7378 [16:35:34<8:39:10, 12.25s/it] + +{'loss': 0.4002, 'learning_rate': 5.606996226939396e-06, 'epoch': 0.66} + + 66%|██████▌ | 4836/7378 [16:35:34<8:39:10, 12.25s/it] + 66%|██████▌ | 4837/7378 [16:35:46<8:42:35, 12.34s/it] + +{'loss': 0.5182, 'learning_rate': 5.603052800374018e-06, 'epoch': 0.66} + + 66%|██████▌ | 4837/7378 [16:35:46<8:42:35, 12.34s/it] + 66%|██████▌ | 4838/7378 [16:35:59<8:43:23, 12.36s/it] + +{'loss': 0.4447, 'learning_rate': 5.5991102212513045e-06, 'epoch': 0.66} + + 66%|██████▌ | 4838/7378 [16:35:59<8:43:23, 12.36s/it] + 66%|██████▌ | 4839/7378 [16:36:11<8:38:52, 12.26s/it] + +{'loss': 0.4516, 'learning_rate': 5.595168490331124e-06, 'epoch': 0.66} + + 66%|██████▌ | 4839/7378 [16:36:11<8:38:52, 12.26s/it] + 66%|██████▌ | 4840/7378 [16:36:23<8:36:20, 12.21s/it] + +{'loss': 0.4129, 'learning_rate': 5.5912276083731884e-06, 'epoch': 0.66} + + 66%|██████▌ | 4840/7378 [16:36:23<8:36:20, 12.21s/it] + 66%|██████▌ | 4841/7378 [16:36:35<8:41:10, 12.33s/it] + +{'loss': 0.426, 'learning_rate': 5.5872875761370394e-06, 'epoch': 0.66} + + 66%|██████▌ | 4841/7378 [16:36:35<8:41:10, 12.33s/it] + 66%|██████▌ | 4842/7378 [16:36:47<8:36:33, 12.22s/it] + +{'loss': 0.3976, 'learning_rate': 5.583348394382055e-06, 'epoch': 0.66} + + 66%|██████▌ | 4842/7378 [16:36:47<8:36:33, 12.22s/it] + 66%|██████▌ | 4843/7378 [16:37:00<8:35:29, 12.20s/it] + +{'loss': 0.4358, 'learning_rate': 5.57941006386745e-06, 'epoch': 0.66} + + 66%|██████▌ | 4843/7378 [16:37:00<8:35:29, 12.20s/it] + 66%|██████▌ | 4844/7378 [16:37:12<8:42:48, 12.38s/it] + +{'loss': 0.4914, 'learning_rate': 5.575472585352274e-06, 'epoch': 0.66} + + 66%|██████▌ | 4844/7378 [16:37:12<8:42:48, 12.38s/it] + 66%|██████▌ | 4845/7378 [16:37:25<8:48:19, 12.51s/it] + +{'loss': 0.3937, 'learning_rate': 5.571535959595422e-06, 'epoch': 0.66} + + 66%|██████▌ | 4845/7378 [16:37:25<8:48:19, 12.51s/it] + 66%|██████▌ | 4846/7378 [16:37:37<8:42:32, 12.38s/it] + +{'loss': 0.4404, 'learning_rate': 5.5676001873556105e-06, 'epoch': 0.66} + + 66%|██████▌ | 4846/7378 [16:37:37<8:42:32, 12.38s/it] + 66%|██████▌ | 4847/7378 [16:37:50<8:42:11, 12.38s/it] + +{'loss': 0.4291, 'learning_rate': 5.5636652693914e-06, 'epoch': 0.66} + + 66%|██████▌ | 4847/7378 [16:37:50<8:42:11, 12.38s/it] + 66%|██████▌ | 4848/7378 [16:38:02<8:41:00, 12.36s/it] + +{'loss': 0.4456, 'learning_rate': 5.559731206461182e-06, 'epoch': 0.66} + + 66%|██████▌ | 4848/7378 [16:38:02<8:41:00, 12.36s/it] + 66%|██████▌ | 4849/7378 [16:38:14<8:40:42, 12.35s/it] + +{'loss': 0.4989, 'learning_rate': 5.555797999323189e-06, 'epoch': 0.66} + + 66%|██████▌ | 4849/7378 [16:38:14<8:40:42, 12.35s/it] + 66%|██████▌ | 4850/7378 [16:38:27<8:46:02, 12.49s/it] + +{'loss': 0.4582, 'learning_rate': 5.551865648735485e-06, 'epoch': 0.66} + + 66%|██████▌ | 4850/7378 [16:38:27<8:46:02, 12.49s/it] + 66%|██████▌ | 4851/7378 [16:38:39<8:42:07, 12.40s/it] + +{'loss': 0.4731, 'learning_rate': 5.547934155455967e-06, 'epoch': 0.66} + + 66%|██████▌ | 4851/7378 [16:38:39<8:42:07, 12.40s/it] + 66%|██████▌ | 4852/7378 [16:38:52<8:40:49, 12.37s/it] + +{'loss': 0.4326, 'learning_rate': 5.544003520242369e-06, 'epoch': 0.66} + + 66%|██████▌ | 4852/7378 [16:38:52<8:40:49, 12.37s/it] + 66%|██████▌ | 4853/7378 [16:39:04<8:36:51, 12.28s/it] + +{'loss': 0.4424, 'learning_rate': 5.540073743852256e-06, 'epoch': 0.66} + + 66%|██████▌ | 4853/7378 [16:39:04<8:36:51, 12.28s/it] + 66%|██████▌ | 4854/7378 [16:39:16<8:39:18, 12.34s/it] + +{'loss': 0.4759, 'learning_rate': 5.536144827043037e-06, 'epoch': 0.66} + + 66%|██████▌ | 4854/7378 [16:39:16<8:39:18, 12.34s/it] + 66%|██████▌ | 4855/7378 [16:39:28<8:33:47, 12.22s/it] + +{'loss': 0.3955, 'learning_rate': 5.532216770571948e-06, 'epoch': 0.66} + + 66%|██████▌ | 4855/7378 [16:39:28<8:33:47, 12.22s/it] + 66%|██████▌ | 4856/7378 [16:39:40<8:29:55, 12.13s/it] + +{'loss': 0.4699, 'learning_rate': 5.528289575196058e-06, 'epoch': 0.66} + + 66%|██████▌ | 4856/7378 [16:39:40<8:29:55, 12.13s/it] + 66%|██████▌ | 4857/7378 [16:39:52<8:29:55, 12.14s/it] + +{'loss': 0.4117, 'learning_rate': 5.524363241672268e-06, 'epoch': 0.66} + + 66%|██████▌ | 4857/7378 [16:39:52<8:29:55, 12.14s/it] + 66%|██████▌ | 4858/7378 [16:40:05<8:34:18, 12.25s/it] + +{'loss': 0.4054, 'learning_rate': 5.520437770757327e-06, 'epoch': 0.66} + + 66%|██████▌ | 4858/7378 [16:40:05<8:34:18, 12.25s/it] + 66%|██████▌ | 4859/7378 [16:40:17<8:35:14, 12.27s/it] + +{'loss': 0.4496, 'learning_rate': 5.516513163207804e-06, 'epoch': 0.66} + + 66%|██████▌ | 4859/7378 [16:40:17<8:35:14, 12.27s/it] + 66%|██████▌ | 4860/7378 [16:40:29<8:33:06, 12.23s/it] + +{'loss': 0.4811, 'learning_rate': 5.512589419780106e-06, 'epoch': 0.66} + + 66%|██████▌ | 4860/7378 [16:40:29<8:33:06, 12.23s/it] + 66%|██████▌ | 4861/7378 [16:40:41<8:33:28, 12.24s/it] + +{'loss': 0.4681, 'learning_rate': 5.50866654123047e-06, 'epoch': 0.66} + + 66%|██████▌ | 4861/7378 [16:40:41<8:33:28, 12.24s/it] + 66%|██████▌ | 4862/7378 [16:40:54<8:36:46, 12.32s/it] + +{'loss': 0.4674, 'learning_rate': 5.504744528314967e-06, 'epoch': 0.66} + + 66%|██████▌ | 4862/7378 [16:40:54<8:36:46, 12.32s/it] + 66%|██████▌ | 4863/7378 [16:41:06<8:35:53, 12.31s/it] + +{'loss': 0.4259, 'learning_rate': 5.5008233817895126e-06, 'epoch': 0.66} + + 66%|██████▌ | 4863/7378 [16:41:06<8:35:53, 12.31s/it] + 66%|██████▌ | 4864/7378 [16:41:18<8:28:52, 12.15s/it] + +{'loss': 0.4181, 'learning_rate': 5.496903102409843e-06, 'epoch': 0.66} + + 66%|██████▌ | 4864/7378 [16:41:18<8:28:52, 12.15s/it] + 66%|██████▌ | 4865/7378 [16:41:30<8:28:48, 12.15s/it] + +{'loss': 0.4149, 'learning_rate': 5.492983690931528e-06, 'epoch': 0.66} + + 66%|██████▌ | 4865/7378 [16:41:30<8:28:48, 12.15s/it] + 66%|██████▌ | 4866/7378 [16:41:42<8:30:54, 12.20s/it] + +{'loss': 0.4249, 'learning_rate': 5.4890651481099736e-06, 'epoch': 0.66} + + 66%|██████▌ | 4866/7378 [16:41:42<8:30:54, 12.20s/it] + 66%|██████▌ | 4867/7378 [16:41:55<8:33:30, 12.27s/it] + +{'loss': 0.4572, 'learning_rate': 5.485147474700415e-06, 'epoch': 0.66} + + 66%|██████▌ | 4867/7378 [16:41:55<8:33:30, 12.27s/it] + 66%|██████▌ | 4868/7378 [16:42:07<8:33:33, 12.28s/it] + +{'loss': 0.4698, 'learning_rate': 5.481230671457929e-06, 'epoch': 0.66} + + 66%|██████▌ | 4868/7378 [16:42:07<8:33:33, 12.28s/it] + 66%|██████▌ | 4869/7378 [16:42:20<8:36:38, 12.35s/it] + +{'loss': 0.4139, 'learning_rate': 5.4773147391374136e-06, 'epoch': 0.66} + + 66%|██████▌ | 4869/7378 [16:42:20<8:36:38, 12.35s/it] + 66%|██████▌ | 4870/7378 [16:42:32<8:39:03, 12.42s/it] + +{'loss': 0.4719, 'learning_rate': 5.473399678493601e-06, 'epoch': 0.66} + + 66%|██████▌ | 4870/7378 [16:42:32<8:39:03, 12.42s/it] + 66%|██████▌ | 4871/7378 [16:42:44<8:29:52, 12.20s/it] + +{'loss': 0.4151, 'learning_rate': 5.469485490281064e-06, 'epoch': 0.66} + + 66%|██████▌ | 4871/7378 [16:42:44<8:29:52, 12.20s/it] + 66%|██████▌ | 4872/7378 [16:42:56<8:33:35, 12.30s/it] + +{'loss': 0.4399, 'learning_rate': 5.465572175254195e-06, 'epoch': 0.66} + + 66%|██████▌ | 4872/7378 [16:42:56<8:33:35, 12.30s/it] + 66%|██████▌ | 4873/7378 [16:43:09<8:30:33, 12.23s/it] + +{'loss': 0.4299, 'learning_rate': 5.461659734167229e-06, 'epoch': 0.66} + + 66%|██████▌ | 4873/7378 [16:43:09<8:30:33, 12.23s/it] + 66%|██████▌ | 4874/7378 [16:43:21<8:30:32, 12.23s/it] + +{'loss': 0.4114, 'learning_rate': 5.457748167774228e-06, 'epoch': 0.66} + + 66%|██████▌ | 4874/7378 [16:43:21<8:30:32, 12.23s/it] + 66%|██████▌ | 4875/7378 [16:43:34<8:41:40, 12.51s/it] + +{'loss': 0.446, 'learning_rate': 5.453837476829083e-06, 'epoch': 0.66} + + 66%|██████▌ | 4875/7378 [16:43:34<8:41:40, 12.51s/it] + 66%|████��█▌ | 4876/7378 [16:43:46<8:36:39, 12.39s/it] + +{'loss': 0.3807, 'learning_rate': 5.449927662085517e-06, 'epoch': 0.66} + + 66%|██████▌ | 4876/7378 [16:43:46<8:36:39, 12.39s/it] + 66%|██████▌ | 4877/7378 [16:43:59<8:37:54, 12.42s/it] + +{'loss': 0.3987, 'learning_rate': 5.446018724297082e-06, 'epoch': 0.66} + + 66%|██████▌ | 4877/7378 [16:43:59<8:37:54, 12.42s/it] + 66%|██████▌ | 4878/7378 [16:44:11<8:38:55, 12.45s/it] + +{'loss': 0.4814, 'learning_rate': 5.442110664217175e-06, 'epoch': 0.66} + + 66%|██████▌ | 4878/7378 [16:44:11<8:38:55, 12.45s/it] + 66%|██████▌ | 4879/7378 [16:44:24<8:40:15, 12.49s/it] + +{'loss': 0.4444, 'learning_rate': 5.438203482599007e-06, 'epoch': 0.66} + + 66%|██████▌ | 4879/7378 [16:44:24<8:40:15, 12.49s/it] + 66%|██████▌ | 4880/7378 [16:44:36<8:38:02, 12.44s/it] + +{'loss': 0.442, 'learning_rate': 5.434297180195626e-06, 'epoch': 0.66} + + 66%|██████▌ | 4880/7378 [16:44:36<8:38:02, 12.44s/it] + 66%|██████▌ | 4881/7378 [16:44:48<8:36:27, 12.41s/it] + +{'loss': 0.4523, 'learning_rate': 5.430391757759907e-06, 'epoch': 0.66} + + 66%|██████▌ | 4881/7378 [16:44:48<8:36:27, 12.41s/it] + 66%|██████▌ | 4882/7378 [16:45:01<8:36:48, 12.42s/it] + +{'loss': 0.4061, 'learning_rate': 5.426487216044569e-06, 'epoch': 0.66} + + 66%|██████▌ | 4882/7378 [16:45:01<8:36:48, 12.42s/it] + 66%|██████▌ | 4883/7378 [16:45:13<8:30:15, 12.27s/it] + +{'loss': 0.4592, 'learning_rate': 5.422583555802144e-06, 'epoch': 0.66} + + 66%|██████▌ | 4883/7378 [16:45:13<8:30:15, 12.27s/it] + 66%|██████▌ | 4884/7378 [16:45:25<8:30:20, 12.28s/it] + +{'loss': 0.512, 'learning_rate': 5.418680777785003e-06, 'epoch': 0.66} + + 66%|██████▌ | 4884/7378 [16:45:25<8:30:20, 12.28s/it] + 66%|██████▌ | 4885/7378 [16:45:38<8:35:45, 12.41s/it] + +{'loss': 0.477, 'learning_rate': 5.414778882745346e-06, 'epoch': 0.66} + + 66%|██████▌ | 4885/7378 [16:45:38<8:35:45, 12.41s/it] + 66%|██████▌ | 4886/7378 [16:45:50<8:34:00, 12.38s/it] + +{'loss': 0.4565, 'learning_rate': 5.410877871435196e-06, 'epoch': 0.66} + + 66%|██████▌ | 4886/7378 [16:45:50<8:34:00, 12.38s/it] + 66%|██████▌ | 4887/7378 [16:46:03<8:38:44, 12.49s/it] + +{'loss': 0.4876, 'learning_rate': 5.406977744606421e-06, 'epoch': 0.66} + + 66%|██████▌ | 4887/7378 [16:46:03<8:38:44, 12.49s/it] + 66%|██████▋ | 4888/7378 [16:46:15<8:39:13, 12.51s/it] + +{'loss': 0.4201, 'learning_rate': 5.403078503010706e-06, 'epoch': 0.66} + + 66%|██████▋ | 4888/7378 [16:46:15<8:39:13, 12.51s/it] + 66%|██████▋ | 4889/7378 [16:46:28<8:37:25, 12.47s/it] + +{'loss': 0.4227, 'learning_rate': 5.3991801473995675e-06, 'epoch': 0.66} + + 66%|██████▋ | 4889/7378 [16:46:28<8:37:25, 12.47s/it] + 66%|██████▋ | 4890/7378 [16:46:40<8:33:45, 12.39s/it] + +{'loss': 0.4015, 'learning_rate': 5.39528267852435e-06, 'epoch': 0.66} + + 66%|██████▋ | 4890/7378 [16:46:40<8:33:45, 12.39s/it] + 66%|██████▋ | 4891/7378 [16:46:52<8:26:59, 12.23s/it] + +{'loss': 0.4296, 'learning_rate': 5.391386097136234e-06, 'epoch': 0.66} + + 66%|██████▋ | 4891/7378 [16:46:52<8:26:59, 12.23s/it] + 66%|██████▋ | 4892/7378 [16:47:04<8:24:32, 12.18s/it] + +{'loss': 0.457, 'learning_rate': 5.387490403986224e-06, 'epoch': 0.66} + + 66%|██████▋ | 4892/7378 [16:47:04<8:24:32, 12.18s/it] + 66%|██████▋ | 4893/7378 [16:47:16<8:27:58, 12.27s/it] + +{'loss': 0.4765, 'learning_rate': 5.383595599825154e-06, 'epoch': 0.66} + + 66%|██████▋ | 4893/7378 [16:47:16<8:27:58, 12.27s/it] + 66%|██████▋ | 4894/7378 [16:47:29<8:30:52, 12.34s/it] + +{'loss': 0.3712, 'learning_rate': 5.3797016854036845e-06, 'epoch': 0.66} + + 66%|██████▋ | 4894/7378 [16:47:29<8:30:52, 12.34s/it] + 66%|██████▋ | 4895/7378 [16:47:41<8:30:44, 12.34s/it] + +{'loss': 0.4833, 'learning_rate': 5.375808661472304e-06, 'epoch': 0.66} + + 66%|██████▋ | 4895/7378 [16:47:41<8:30:44, 12.34s/it] + 66%|██████▋ | 4896/7378 [16:47:53<8:25:58, 12.23s/it] + +{'loss': 0.4465, 'learning_rate': 5.371916528781338e-06, 'epoch': 0.66} + + 66%|██████▋ | 4896/7378 [16:47:53<8:25:58, 12.23s/it] + 66%|██████▋ | 4897/7378 [16:48:05<8:22:55, 12.16s/it] + +{'loss': 0.3963, 'learning_rate': 5.368025288080931e-06, 'epoch': 0.66} + + 66%|██████▋ | 4897/7378 [16:48:05<8:22:55, 12.16s/it] + 66%|██████▋ | 4898/7378 [16:48:18<8:25:43, 12.24s/it] + +{'loss': 0.4237, 'learning_rate': 5.36413494012106e-06, 'epoch': 0.66} + + 66%|████���█▋ | 4898/7378 [16:48:18<8:25:43, 12.24s/it] + 66%|██████▋ | 4899/7378 [16:48:30<8:22:32, 12.16s/it] + +{'loss': 0.4714, 'learning_rate': 5.360245485651523e-06, 'epoch': 0.66} + + 66%|██████▋ | 4899/7378 [16:48:30<8:22:32, 12.16s/it] + 66%|██████▋ | 4900/7378 [16:48:42<8:23:24, 12.19s/it] + +{'loss': 0.492, 'learning_rate': 5.356356925421959e-06, 'epoch': 0.66} + + 66%|██████▋ | 4900/7378 [16:48:42<8:23:24, 12.19s/it] + 66%|██████▋ | 4901/7378 [16:48:54<8:24:42, 12.23s/it] + +{'loss': 0.4335, 'learning_rate': 5.352469260181825e-06, 'epoch': 0.66} + + 66%|██████▋ | 4901/7378 [16:48:54<8:24:42, 12.23s/it] + 66%|██████▋ | 4902/7378 [16:49:06<8:18:23, 12.08s/it] + +{'loss': 0.4115, 'learning_rate': 5.348582490680405e-06, 'epoch': 0.66} + + 66%|██████▋ | 4902/7378 [16:49:06<8:18:23, 12.08s/it] + 66%|██████▋ | 4903/7378 [16:49:18<8:23:19, 12.20s/it] + +{'loss': 0.415, 'learning_rate': 5.344696617666815e-06, 'epoch': 0.66} + + 66%|██████▋ | 4903/7378 [16:49:18<8:23:19, 12.20s/it] + 66%|██████▋ | 4904/7378 [16:49:31<8:23:04, 12.20s/it] + +{'loss': 0.4629, 'learning_rate': 5.340811641889991e-06, 'epoch': 0.66} + + 66%|██████▋ | 4904/7378 [16:49:31<8:23:04, 12.20s/it] + 66%|██████▋ | 4905/7378 [16:49:43<8:21:23, 12.16s/it] + +{'loss': 0.5175, 'learning_rate': 5.336927564098712e-06, 'epoch': 0.66} + + 66%|██████▋ | 4905/7378 [16:49:43<8:21:23, 12.16s/it] + 66%|██████▋ | 4906/7378 [16:49:55<8:21:15, 12.17s/it] + +{'loss': 0.4682, 'learning_rate': 5.333044385041565e-06, 'epoch': 0.66} + + 66%|██████▋ | 4906/7378 [16:49:55<8:21:15, 12.17s/it] + 67%|██████▋ | 4907/7378 [16:50:07<8:17:02, 12.07s/it] + +{'loss': 0.4031, 'learning_rate': 5.329162105466974e-06, 'epoch': 0.67} + + 67%|██████▋ | 4907/7378 [16:50:07<8:17:02, 12.07s/it] + 67%|██████▋ | 4908/7378 [16:50:19<8:14:57, 12.02s/it] + +{'loss': 0.537, 'learning_rate': 5.325280726123182e-06, 'epoch': 0.67} + + 67%|██████▋ | 4908/7378 [16:50:19<8:14:57, 12.02s/it] + 67%|██████▋ | 4909/7378 [16:50:31<8:19:35, 12.14s/it] + +{'loss': 0.4453, 'learning_rate': 5.321400247758275e-06, 'epoch': 0.67} + + 67%|██████▋ | 4909/7378 [16:50:31<8:19:35, 12.14s/it] + 67%|██████▋ | 4910/7378 [16:50:43<8:21:06, 12.18s/it] + +{'loss': 0.4001, 'learning_rate': 5.317520671120147e-06, 'epoch': 0.67} + + 67%|██████▋ | 4910/7378 [16:50:43<8:21:06, 12.18s/it] + 67%|██████▋ | 4911/7378 [16:50:56<8:25:10, 12.29s/it] + +{'loss': 0.4591, 'learning_rate': 5.313641996956529e-06, 'epoch': 0.67} + + 67%|██████▋ | 4911/7378 [16:50:56<8:25:10, 12.29s/it] + 67%|██████▋ | 4912/7378 [16:51:08<8:27:12, 12.34s/it] + +{'loss': 0.4499, 'learning_rate': 5.309764226014972e-06, 'epoch': 0.67} + + 67%|██████▋ | 4912/7378 [16:51:08<8:27:12, 12.34s/it] + 67%|██████▋ | 4913/7378 [16:51:20<8:23:46, 12.26s/it] + +{'loss': 0.3974, 'learning_rate': 5.305887359042851e-06, 'epoch': 0.67} + + 67%|██████▋ | 4913/7378 [16:51:20<8:23:46, 12.26s/it] + 67%|██████▋ | 4914/7378 [16:51:33<8:24:05, 12.28s/it] + +{'loss': 0.4206, 'learning_rate': 5.302011396787379e-06, 'epoch': 0.67} + + 67%|██████▋ | 4914/7378 [16:51:33<8:24:05, 12.28s/it] + 67%|██████▋ | 4915/7378 [16:51:45<8:26:14, 12.33s/it] + +{'loss': 0.4421, 'learning_rate': 5.298136339995589e-06, 'epoch': 0.67} + + 67%|██████▋ | 4915/7378 [16:51:45<8:26:14, 12.33s/it] + 67%|██████▋ | 4916/7378 [16:51:58<8:28:16, 12.39s/it] + +{'loss': 0.456, 'learning_rate': 5.294262189414332e-06, 'epoch': 0.67} + + 67%|██████▋ | 4916/7378 [16:51:58<8:28:16, 12.39s/it] + 67%|██████▋ | 4917/7378 [16:52:10<8:23:53, 12.29s/it] + +{'loss': 0.4567, 'learning_rate': 5.290388945790292e-06, 'epoch': 0.67} + + 67%|██████▋ | 4917/7378 [16:52:10<8:23:53, 12.29s/it] + 67%|██████▋ | 4918/7378 [16:52:22<8:28:37, 12.41s/it] + +{'loss': 0.4647, 'learning_rate': 5.28651660986997e-06, 'epoch': 0.67} + + 67%|██████▋ | 4918/7378 [16:52:22<8:28:37, 12.41s/it] + 67%|██████▋ | 4919/7378 [16:52:35<8:26:23, 12.36s/it] + +{'loss': 0.4129, 'learning_rate': 5.282645182399708e-06, 'epoch': 0.67} + + 67%|██████▋ | 4919/7378 [16:52:35<8:26:23, 12.36s/it] + 67%|██████▋ | 4920/7378 [16:52:47<8:24:42, 12.32s/it] + +{'loss': 0.4094, 'learning_rate': 5.278774664125659e-06, 'epoch': 0.67} + + 67%|██████▋ | 4920/7378 [16:52:47<8:24:42, 12.32s/it] + 67%|██████▋ | 4921/7378 [16:52:59<8:21:28, 12.25s/it] + +{'loss': 0.4488, 'learning_rate': 5.274905055793802e-06, 'epoch': 0.67} + + 67%|██████▋ | 4921/7378 [16:52:59<8:21:28, 12.25s/it] + 67%|██████▋ | 4922/7378 [16:53:11<8:21:16, 12.25s/it] + +{'loss': 0.5229, 'learning_rate': 5.271036358149946e-06, 'epoch': 0.67} + + 67%|██████▋ | 4922/7378 [16:53:11<8:21:16, 12.25s/it] + 67%|██████▋ | 4923/7378 [16:53:23<8:18:53, 12.19s/it] + +{'loss': 0.4911, 'learning_rate': 5.2671685719397184e-06, 'epoch': 0.67} + + 67%|██████▋ | 4923/7378 [16:53:23<8:18:53, 12.19s/it] + 67%|██████▋ | 4924/7378 [16:53:36<8:20:59, 12.25s/it] + +{'loss': 0.4506, 'learning_rate': 5.263301697908579e-06, 'epoch': 0.67} + + 67%|██████▋ | 4924/7378 [16:53:36<8:20:59, 12.25s/it] + 67%|██████▋ | 4925/7378 [16:53:48<8:18:12, 12.19s/it] + +{'loss': 0.4132, 'learning_rate': 5.2594357368018065e-06, 'epoch': 0.67} + + 67%|██████▋ | 4925/7378 [16:53:48<8:18:12, 12.19s/it] + 67%|██████▋ | 4926/7378 [16:54:00<8:17:40, 12.18s/it] + +{'loss': 0.4634, 'learning_rate': 5.255570689364502e-06, 'epoch': 0.67} + + 67%|██████▋ | 4926/7378 [16:54:00<8:17:40, 12.18s/it] + 67%|██████▋ | 4927/7378 [16:54:12<8:18:50, 12.21s/it] + +{'loss': 0.4443, 'learning_rate': 5.251706556341596e-06, 'epoch': 0.67} + + 67%|██████▋ | 4927/7378 [16:54:12<8:18:50, 12.21s/it] + 67%|██████▋ | 4928/7378 [16:54:24<8:18:32, 12.21s/it] + +{'loss': 0.4359, 'learning_rate': 5.247843338477832e-06, 'epoch': 0.67} + + 67%|██████▋ | 4928/7378 [16:54:24<8:18:32, 12.21s/it] + 67%|██████▋ | 4929/7378 [16:54:37<8:23:46, 12.34s/it] + +{'loss': 0.5117, 'learning_rate': 5.243981036517793e-06, 'epoch': 0.67} + + 67%|██████▋ | 4929/7378 [16:54:37<8:23:46, 12.34s/it] + 67%|██████▋ | 4930/7378 [16:54:49<8:25:20, 12.39s/it] + +{'loss': 0.4326, 'learning_rate': 5.240119651205876e-06, 'epoch': 0.67} + + 67%|██████▋ | 4930/7378 [16:54:49<8:25:20, 12.39s/it] + 67%|██████▋ | 4931/7378 [16:55:02<8:25:32, 12.40s/it] + +{'loss': 0.4655, 'learning_rate': 5.2362591832863005e-06, 'epoch': 0.67} + + 67%|██████▋ | 4931/7378 [16:55:02<8:25:32, 12.40s/it] + 67%|██████▋ | 4932/7378 [16:55:14<8:25:33, 12.40s/it] + +{'loss': 0.5018, 'learning_rate': 5.232399633503107e-06, 'epoch': 0.67} + + 67%|██████▋ | 4932/7378 [16:55:14<8:25:33, 12.40s/it] + 67%|██████▋ | 4933/7378 [16:55:27<8:25:04, 12.39s/it] + +{'loss': 0.4223, 'learning_rate': 5.228541002600172e-06, 'epoch': 0.67} + + 67%|██████▋ | 4933/7378 [16:55:27<8:25:04, 12.39s/it] + 67%|██████▋ | 4934/7378 [16:55:39<8:26:06, 12.42s/it] + +{'loss': 0.4305, 'learning_rate': 5.224683291321182e-06, 'epoch': 0.67} + + 67%|██████▋ | 4934/7378 [16:55:39<8:26:06, 12.42s/it] + 67%|██████▋ | 4935/7378 [16:55:51<8:24:04, 12.38s/it] + +{'loss': 0.5073, 'learning_rate': 5.220826500409651e-06, 'epoch': 0.67} + + 67%|██████▋ | 4935/7378 [16:55:51<8:24:04, 12.38s/it] + 67%|██████▋ | 4936/7378 [16:56:04<8:20:57, 12.31s/it] + +{'loss': 0.4413, 'learning_rate': 5.216970630608913e-06, 'epoch': 0.67} + + 67%|██████▋ | 4936/7378 [16:56:04<8:20:57, 12.31s/it] + 67%|██████▋ | 4937/7378 [16:56:16<8:20:24, 12.30s/it] + +{'loss': 0.4394, 'learning_rate': 5.213115682662124e-06, 'epoch': 0.67} + + 67%|██████▋ | 4937/7378 [16:56:16<8:20:24, 12.30s/it] + 67%|██████▋ | 4938/7378 [16:56:28<8:23:30, 12.38s/it] + +{'loss': 0.4804, 'learning_rate': 5.209261657312274e-06, 'epoch': 0.67} + + 67%|██████▋ | 4938/7378 [16:56:28<8:23:30, 12.38s/it] + 67%|██████▋ | 4939/7378 [16:56:41<8:24:10, 12.40s/it] + +{'loss': 0.453, 'learning_rate': 5.2054085553021595e-06, 'epoch': 0.67} + + 67%|██████▋ | 4939/7378 [16:56:41<8:24:10, 12.40s/it] + 67%|██████▋ | 4940/7378 [16:56:53<8:26:41, 12.47s/it] + +{'loss': 0.4525, 'learning_rate': 5.201556377374406e-06, 'epoch': 0.67} + + 67%|██████▋ | 4940/7378 [16:56:53<8:26:41, 12.47s/it] + 67%|██████▋ | 4941/7378 [16:57:06<8:24:37, 12.42s/it] + +{'loss': 0.4249, 'learning_rate': 5.197705124271459e-06, 'epoch': 0.67} + + 67%|██████▋ | 4941/7378 [16:57:06<8:24:37, 12.42s/it] + 67%|██████▋ | 4942/7378 [16:57:18<8:26:20, 12.47s/it] + +{'loss': 0.4589, 'learning_rate': 5.193854796735592e-06, 'epoch': 0.67} + + 67%|██████▋ | 4942/7378 [16:57:18<8:26:20, 12.47s/it] + 67%|██████▋ | 4943/7378 [16:57:31<8:27:28, 12.50s/it] + +{'loss': 0.3481, 'learning_rate': 5.190005395508893e-06, 'epoch': 0.67} + + 67%|██████▋ | 4943/7378 [16:57:31<8:27:28, 12.50s/it] + 67%|██████▋ | 4944/7378 [16:57:43<8:18:40, 12.29s/it] + +{'loss': 0.3928, 'learning_rate': 5.186156921333272e-06, 'epoch': 0.67} + + 67%|██████▋ | 4944/7378 [16:57:43<8:18:40, 12.29s/it] + 67%|██████▋ | 4945/7378 [16:57:55<8:23:40, 12.42s/it] + +{'loss': 0.4057, 'learning_rate': 5.182309374950463e-06, 'epoch': 0.67} + + 67%|██████▋ | 4945/7378 [16:57:55<8:23:40, 12.42s/it] + 67%|██████▋ | 4946/7378 [16:58:08<8:25:52, 12.48s/it] + +{'loss': 0.4187, 'learning_rate': 5.178462757102018e-06, 'epoch': 0.67} + + 67%|██████▋ | 4946/7378 [16:58:08<8:25:52, 12.48s/it] + 67%|██████▋ | 4947/7378 [16:58:20<8:23:17, 12.42s/it] + +{'loss': 0.4167, 'learning_rate': 5.1746170685293186e-06, 'epoch': 0.67} + + 67%|██████▋ | 4947/7378 [16:58:20<8:23:17, 12.42s/it] + 67%|██████▋ | 4948/7378 [16:58:33<8:23:46, 12.44s/it] + +{'loss': 0.4615, 'learning_rate': 5.170772309973558e-06, 'epoch': 0.67} + + 67%|██████▋ | 4948/7378 [16:58:33<8:23:46, 12.44s/it] + 67%|██████▋ | 4949/7378 [16:58:45<8:17:27, 12.29s/it] + +{'loss': 0.3977, 'learning_rate': 5.16692848217575e-06, 'epoch': 0.67} + + 67%|██████▋ | 4949/7378 [16:58:45<8:17:27, 12.29s/it] + 67%|██████▋ | 4950/7378 [16:58:57<8:16:41, 12.27s/it] + +{'loss': 0.4087, 'learning_rate': 5.163085585876733e-06, 'epoch': 0.67} + + 67%|██████▋ | 4950/7378 [16:58:57<8:16:41, 12.27s/it] + 67%|██████▋ | 4951/7378 [16:59:09<8:17:59, 12.31s/it] + +{'loss': 0.4451, 'learning_rate': 5.159243621817169e-06, 'epoch': 0.67} + + 67%|██████▋ | 4951/7378 [16:59:09<8:17:59, 12.31s/it] + 67%|██████▋ | 4952/7378 [16:59:22<8:16:58, 12.29s/it] + +{'loss': 0.4865, 'learning_rate': 5.1554025907375345e-06, 'epoch': 0.67} + + 67%|██████▋ | 4952/7378 [16:59:22<8:16:58, 12.29s/it] + 67%|██████▋ | 4953/7378 [16:59:34<8:14:49, 12.24s/it] + +{'loss': 0.4528, 'learning_rate': 5.151562493378128e-06, 'epoch': 0.67} + + 67%|██████▋ | 4953/7378 [16:59:34<8:14:49, 12.24s/it] + 67%|██████▋ | 4954/7378 [16:59:46<8:14:11, 12.23s/it] + +{'loss': 0.4786, 'learning_rate': 5.147723330479069e-06, 'epoch': 0.67} + + 67%|██████▋ | 4954/7378 [16:59:46<8:14:11, 12.23s/it] + 67%|██████▋ | 4955/7378 [16:59:58<8:09:42, 12.13s/it] + +{'loss': 0.4295, 'learning_rate': 5.14388510278029e-06, 'epoch': 0.67} + + 67%|██████▋ | 4955/7378 [16:59:58<8:09:42, 12.13s/it] + 67%|██████▋ | 4956/7378 [17:00:10<8:11:58, 12.19s/it] + +{'loss': 0.4617, 'learning_rate': 5.140047811021556e-06, 'epoch': 0.67} + + 67%|██████▋ | 4956/7378 [17:00:10<8:11:58, 12.19s/it] + 67%|██████▋ | 4957/7378 [17:00:22<8:11:41, 12.19s/it] + +{'loss': 0.4293, 'learning_rate': 5.13621145594244e-06, 'epoch': 0.67} + + 67%|██████▋ | 4957/7378 [17:00:22<8:11:41, 12.19s/it] + 67%|██████▋ | 4958/7378 [17:00:34<8:09:08, 12.13s/it] + +{'loss': 0.4456, 'learning_rate': 5.132376038282347e-06, 'epoch': 0.67} + + 67%|██████▋ | 4958/7378 [17:00:34<8:09:08, 12.13s/it] + 67%|██████▋ | 4959/7378 [17:00:47<8:10:22, 12.16s/it] + +{'loss': 0.4476, 'learning_rate': 5.128541558780487e-06, 'epoch': 0.67} + + 67%|██████▋ | 4959/7378 [17:00:47<8:10:22, 12.16s/it] + 67%|██████▋ | 4960/7378 [17:00:59<8:10:40, 12.18s/it] + +{'loss': 0.4296, 'learning_rate': 5.124708018175894e-06, 'epoch': 0.67} + + 67%|██████▋ | 4960/7378 [17:00:59<8:10:40, 12.18s/it] + 67%|██████▋ | 4961/7378 [17:01:11<8:11:56, 12.21s/it] + +{'loss': 0.441, 'learning_rate': 5.120875417207431e-06, 'epoch': 0.67} + + 67%|██████▋ | 4961/7378 [17:01:11<8:11:56, 12.21s/it] + 67%|██████▋ | 4962/7378 [17:01:23<8:13:43, 12.26s/it] + +{'loss': 0.4519, 'learning_rate': 5.117043756613766e-06, 'epoch': 0.67} + + 67%|██████▋ | 4962/7378 [17:01:23<8:13:43, 12.26s/it] + 67%|██████▋ | 4963/7378 [17:01:36<8:16:20, 12.33s/it] + +{'loss': 0.4793, 'learning_rate': 5.113213037133395e-06, 'epoch': 0.67} + + 67%|██████▋ | 4963/7378 [17:01:36<8:16:20, 12.33s/it] + 67%|██████▋ | 4964/7378 [17:01:48<8:18:11, 12.38s/it] + +{'loss': 0.4937, 'learning_rate': 5.109383259504626e-06, 'epoch': 0.67} + + 67%|██████▋ | 4964/7378 [17:01:48<8:18:11, 12.38s/it] + 67%|██████▋ | 4965/7378 [17:02:01<8:15:08, 12.31s/it] + +{'loss': 0.392, 'learning_rate': 5.105554424465584e-06, 'epoch': 0.67} + + 67%|██████▋ | 4965/7378 [17:02:01<8:15:08, 12.31s/it] + 67%|██████▋ | 4966/7378 [17:02:13<8:18:53, 12.41s/it] + +{'loss': 0.4902, 'learning_rate': 5.101726532754228e-06, 'epoch': 0.67} + + 67%|███���██▋ | 4966/7378 [17:02:13<8:18:53, 12.41s/it] + 67%|██████▋ | 4967/7378 [17:02:25<8:16:17, 12.35s/it] + +{'loss': 0.4401, 'learning_rate': 5.0978995851083165e-06, 'epoch': 0.67} + + 67%|██████▋ | 4967/7378 [17:02:25<8:16:17, 12.35s/it] + 67%|██████▋ | 4968/7378 [17:02:38<8:20:03, 12.45s/it] + +{'loss': 0.5014, 'learning_rate': 5.094073582265437e-06, 'epoch': 0.67} + + 67%|██████▋ | 4968/7378 [17:02:38<8:20:03, 12.45s/it] + 67%|██████▋ | 4969/7378 [17:02:50<8:13:20, 12.29s/it] + +{'loss': 0.4358, 'learning_rate': 5.090248524962988e-06, 'epoch': 0.67} + + 67%|██████▋ | 4969/7378 [17:02:50<8:13:20, 12.29s/it] + 67%|██████▋ | 4970/7378 [17:03:02<8:13:46, 12.30s/it] + +{'loss': 0.4716, 'learning_rate': 5.086424413938194e-06, 'epoch': 0.67} + + 67%|██████▋ | 4970/7378 [17:03:02<8:13:46, 12.30s/it] + 67%|██████▋ | 4971/7378 [17:03:15<8:20:41, 12.48s/it] + +{'loss': 0.4586, 'learning_rate': 5.08260124992809e-06, 'epoch': 0.67} + + 67%|██████▋ | 4971/7378 [17:03:15<8:20:41, 12.48s/it] + 67%|██████▋ | 4972/7378 [17:03:27<8:16:52, 12.39s/it] + +{'loss': 0.3906, 'learning_rate': 5.078779033669532e-06, 'epoch': 0.67} + + 67%|██████▋ | 4972/7378 [17:03:28<8:16:52, 12.39s/it] + 67%|██████▋ | 4973/7378 [17:03:40<8:14:51, 12.35s/it] + +{'loss': 0.458, 'learning_rate': 5.07495776589919e-06, 'epoch': 0.67} + + 67%|██████▋ | 4973/7378 [17:03:40<8:14:51, 12.35s/it] + 67%|██████▋ | 4974/7378 [17:03:52<8:13:58, 12.33s/it] + +{'loss': 0.4789, 'learning_rate': 5.071137447353551e-06, 'epoch': 0.67} + + 67%|██████▋ | 4974/7378 [17:03:52<8:13:58, 12.33s/it] + 67%|██████▋ | 4975/7378 [17:04:04<8:13:41, 12.33s/it] + +{'loss': 0.4516, 'learning_rate': 5.06731807876893e-06, 'epoch': 0.67} + + 67%|██████▋ | 4975/7378 [17:04:04<8:13:41, 12.33s/it] + 67%|██████▋ | 4976/7378 [17:04:17<8:12:59, 12.31s/it] + +{'loss': 0.402, 'learning_rate': 5.063499660881447e-06, 'epoch': 0.67} + + 67%|██████▋ | 4976/7378 [17:04:17<8:12:59, 12.31s/it] + 67%|██████▋ | 4977/7378 [17:04:29<8:09:56, 12.24s/it] + +{'loss': 0.3836, 'learning_rate': 5.0596821944270406e-06, 'epoch': 0.67} + + 67%|██████▋ | 4977/7378 [17:04:29<8:09:56, 12.24s/it] + 67%|██████▋ | 4978/7378 [17:04:41<8:12:08, 12.30s/it] + +{'loss': 0.4304, 'learning_rate': 5.055865680141463e-06, 'epoch': 0.67} + + 67%|██████▋ | 4978/7378 [17:04:41<8:12:08, 12.30s/it] + 67%|██████▋ | 4979/7378 [17:04:54<8:14:06, 12.36s/it] + +{'loss': 0.4211, 'learning_rate': 5.052050118760297e-06, 'epoch': 0.67} + + 67%|██████▋ | 4979/7378 [17:04:54<8:14:06, 12.36s/it] + 67%|██████▋ | 4980/7378 [17:05:06<8:13:10, 12.34s/it] + +{'loss': 0.4986, 'learning_rate': 5.048235511018928e-06, 'epoch': 0.67} + + 67%|██████▋ | 4980/7378 [17:05:06<8:13:10, 12.34s/it] + 68%|██████▊ | 4981/7378 [17:05:19<8:16:44, 12.43s/it] + +{'loss': 0.4731, 'learning_rate': 5.044421857652561e-06, 'epoch': 0.68} + + 68%|██████▊ | 4981/7378 [17:05:19<8:16:44, 12.43s/it] + 68%|██████▊ | 4982/7378 [17:05:31<8:16:47, 12.44s/it] + +{'loss': 0.433, 'learning_rate': 5.0406091593962195e-06, 'epoch': 0.68} + + 68%|██████▊ | 4982/7378 [17:05:31<8:16:47, 12.44s/it] + 68%|██████▊ | 4983/7378 [17:05:44<8:18:26, 12.49s/it] + +{'loss': 0.4641, 'learning_rate': 5.036797416984736e-06, 'epoch': 0.68} + + 68%|██████▊ | 4983/7378 [17:05:44<8:18:26, 12.49s/it] + 68%|██████▊ | 4984/7378 [17:05:56<8:20:03, 12.53s/it] + +{'loss': 0.455, 'learning_rate': 5.032986631152772e-06, 'epoch': 0.68} + + 68%|██████▊ | 4984/7378 [17:05:56<8:20:03, 12.53s/it] + 68%|██████▊ | 4985/7378 [17:06:08<8:11:52, 12.33s/it] + +{'loss': 0.4972, 'learning_rate': 5.029176802634794e-06, 'epoch': 0.68} + + 68%|██████▊ | 4985/7378 [17:06:08<8:11:52, 12.33s/it] + 68%|██████▊ | 4986/7378 [17:06:20<8:07:39, 12.23s/it] + +{'loss': 0.4417, 'learning_rate': 5.025367932165086e-06, 'epoch': 0.68} + + 68%|██████▊ | 4986/7378 [17:06:20<8:07:39, 12.23s/it] + 68%|██████▊ | 4987/7378 [17:06:33<8:09:10, 12.28s/it] + +{'loss': 0.4434, 'learning_rate': 5.021560020477749e-06, 'epoch': 0.68} + + 68%|██████▊ | 4987/7378 [17:06:33<8:09:10, 12.28s/it] + 68%|██████▊ | 4988/7378 [17:06:45<8:11:08, 12.33s/it] + +{'loss': 0.4703, 'learning_rate': 5.017753068306692e-06, 'epoch': 0.68} + + 68%|██████▊ | 4988/7378 [17:06:45<8:11:08, 12.33s/it] + 68%|██████▊ | 4989/7378 [17:06:57<8:05:55, 12.20s/it] + +{'loss': 0.4585, 'learning_rate': 5.013947076385657e-06, 'epoch': 0.68} + + 68%|██████▊ | 4989/7378 [17:06:57<8:05:55, 12.20s/it] + 68%|██████▊ | 4990/7378 [17:07:09<8:09:42, 12.30s/it] + +{'loss': 0.4279, 'learning_rate': 5.010142045448181e-06, 'epoch': 0.68} + + 68%|██████▊ | 4990/7378 [17:07:09<8:09:42, 12.30s/it] + 68%|██████▊ | 4991/7378 [17:07:22<8:08:18, 12.27s/it] + +{'loss': 0.4235, 'learning_rate': 5.006337976227627e-06, 'epoch': 0.68} + + 68%|██████▊ | 4991/7378 [17:07:22<8:08:18, 12.27s/it] + 68%|██████▊ | 4992/7378 [17:07:34<8:05:54, 12.22s/it] + +{'loss': 0.4337, 'learning_rate': 5.002534869457165e-06, 'epoch': 0.68} + + 68%|██████▊ | 4992/7378 [17:07:34<8:05:54, 12.22s/it] + 68%|██████▊ | 4993/7378 [17:07:46<8:10:45, 12.35s/it] + +{'loss': 0.4443, 'learning_rate': 4.998732725869791e-06, 'epoch': 0.68} + + 68%|██████▊ | 4993/7378 [17:07:46<8:10:45, 12.35s/it] + 68%|██████▊ | 4994/7378 [17:07:59<8:12:23, 12.39s/it] + +{'loss': 0.4264, 'learning_rate': 4.9949315461983075e-06, 'epoch': 0.68} + + 68%|██████▊ | 4994/7378 [17:07:59<8:12:23, 12.39s/it] + 68%|██████▊ | 4995/7378 [17:08:11<8:06:33, 12.25s/it] + +{'loss': 0.497, 'learning_rate': 4.99113133117533e-06, 'epoch': 0.68} + + 68%|██████▊ | 4995/7378 [17:08:11<8:06:33, 12.25s/it] + 68%|██████▊ | 4996/7378 [17:08:23<8:03:06, 12.17s/it] + +{'loss': 0.3875, 'learning_rate': 4.9873320815332906e-06, 'epoch': 0.68} + + 68%|██████▊ | 4996/7378 [17:08:23<8:03:06, 12.17s/it] + 68%|██████▊ | 4997/7378 [17:08:35<8:02:29, 12.16s/it] + +{'loss': 0.5371, 'learning_rate': 4.9835337980044315e-06, 'epoch': 0.68} + + 68%|██████▊ | 4997/7378 [17:08:35<8:02:29, 12.16s/it] + 68%|██████▊ | 4998/7378 [17:08:47<8:01:25, 12.14s/it] + +{'loss': 0.4324, 'learning_rate': 4.97973648132082e-06, 'epoch': 0.68} + + 68%|██████▊ | 4998/7378 [17:08:47<8:01:25, 12.14s/it] + 68%|██████▊ | 4999/7378 [17:09:00<8:10:49, 12.38s/it] + +{'loss': 0.4536, 'learning_rate': 4.975940132214326e-06, 'epoch': 0.68} + + 68%|██████▊ | 4999/7378 [17:09:00<8:10:49, 12.38s/it] + 68%|██████▊ | 5000/7378 [17:09:12<8:09:32, 12.35s/it] + +{'loss': 0.493, 'learning_rate': 4.972144751416632e-06, 'epoch': 0.68} + + 68%|██████▊ | 5000/7378 [17:09:12<8:09:32, 12.35s/it] + 68%|██████▊ | 5001/7378 [17:09:24<8:06:38, 12.28s/it] + +{'loss': 0.4676, 'learning_rate': 4.968350339659247e-06, 'epoch': 0.68} + + 68%|██████▊ | 5001/7378 [17:09:24<8:06:38, 12.28s/it] + 68%|██████▊ | 5002/7378 [17:09:37<8:09:42, 12.37s/it] + +{'loss': 0.4747, 'learning_rate': 4.964556897673475e-06, 'epoch': 0.68} + + 68%|██████▊ | 5002/7378 [17:09:37<8:09:42, 12.37s/it] + 68%|██████▊ | 5003/7378 [17:09:49<8:09:02, 12.35s/it] + +{'loss': 0.5036, 'learning_rate': 4.960764426190451e-06, 'epoch': 0.68} + + 68%|██████▊ | 5003/7378 [17:09:49<8:09:02, 12.35s/it] + 68%|██████▊ | 5004/7378 [17:10:02<8:08:42, 12.35s/it] + +{'loss': 0.462, 'learning_rate': 4.9569729259411104e-06, 'epoch': 0.68} + + 68%|██████▊ | 5004/7378 [17:10:02<8:08:42, 12.35s/it] + 68%|██████▊ | 5005/7378 [17:10:14<8:11:48, 12.44s/it] + +{'loss': 0.4362, 'learning_rate': 4.953182397656206e-06, 'epoch': 0.68} + + 68%|██████▊ | 5005/7378 [17:10:14<8:11:48, 12.44s/it] + 68%|██████▊ | 5006/7378 [17:10:26<8:09:43, 12.39s/it] + +{'loss': 0.4353, 'learning_rate': 4.9493928420663e-06, 'epoch': 0.68} + + 68%|██████▊ | 5006/7378 [17:10:26<8:09:43, 12.39s/it] + 68%|██████▊ | 5007/7378 [17:10:39<8:05:30, 12.29s/it] + +{'loss': 0.4664, 'learning_rate': 4.945604259901771e-06, 'epoch': 0.68} + + 68%|██████▊ | 5007/7378 [17:10:39<8:05:30, 12.29s/it] + 68%|██████▊ | 5008/7378 [17:10:51<8:09:59, 12.40s/it] + +{'loss': 0.4135, 'learning_rate': 4.941816651892813e-06, 'epoch': 0.68} + + 68%|██████▊ | 5008/7378 [17:10:51<8:09:59, 12.40s/it] + 68%|██████▊ | 5009/7378 [17:11:04<8:09:13, 12.39s/it] + +{'loss': 0.4579, 'learning_rate': 4.938030018769424e-06, 'epoch': 0.68} + + 68%|██████▊ | 5009/7378 [17:11:04<8:09:13, 12.39s/it] + 68%|██████▊ | 5010/7378 [17:11:16<8:08:20, 12.37s/it] + +{'loss': 0.4438, 'learning_rate': 4.93424436126142e-06, 'epoch': 0.68} + + 68%|██████▊ | 5010/7378 [17:11:16<8:08:20, 12.37s/it] + 68%|██████▊ | 5011/7378 [17:11:28<8:04:23, 12.28s/it] + +{'loss': 0.3966, 'learning_rate': 4.930459680098423e-06, 'epoch': 0.68} + + 68%|██████▊ | 5011/7378 [17:11:28<8:04:23, 12.28s/it] + 68%|██████▊ | 5012/7378 [17:11:41<8:08:34, 12.39s/it] + +{'loss': 0.4041, 'learning_rate': 4.926675976009878e-06, 'epoch': 0.68} + + 68%|██████▊ | 5012/7378 [17:11:41<8:08:34, 12.39s/it] + 68%|██████▊ | 5013/7378 [17:11:53<8:05:55, 12.33s/it] + +{'loss': 0.4398, 'learning_rate': 4.92289324972503e-06, 'epoch': 0.68} + + 68%|██████▊ | 5013/7378 [17:11:53<8:05:55, 12.33s/it] + 68%|██████▊ | 5014/7378 [17:12:05<8:07:59, 12.39s/it] + +{'loss': 0.4211, 'learning_rate': 4.919111501972943e-06, 'epoch': 0.68} + + 68%|██████▊ | 5014/7378 [17:12:05<8:07:59, 12.39s/it] + 68%|██████▊ | 5015/7378 [17:12:18<8:07:12, 12.37s/it] + +{'loss': 0.4741, 'learning_rate': 4.915330733482486e-06, 'epoch': 0.68} + + 68%|██████▊ | 5015/7378 [17:12:18<8:07:12, 12.37s/it] + 68%|██████▊ | 5016/7378 [17:12:30<8:03:22, 12.28s/it] + +{'loss': 0.4407, 'learning_rate': 4.911550944982343e-06, 'epoch': 0.68} + + 68%|██████▊ | 5016/7378 [17:12:30<8:03:22, 12.28s/it] + 68%|██████▊ | 5017/7378 [17:12:42<8:05:21, 12.33s/it] + +{'loss': 0.384, 'learning_rate': 4.9077721372010135e-06, 'epoch': 0.68} + + 68%|██████▊ | 5017/7378 [17:12:42<8:05:21, 12.33s/it] + 68%|██████▊ | 5018/7378 [17:12:54<8:03:52, 12.30s/it] + +{'loss': 0.4389, 'learning_rate': 4.9039943108668e-06, 'epoch': 0.68} + + 68%|██████▊ | 5018/7378 [17:12:54<8:03:52, 12.30s/it] + 68%|██████▊ | 5019/7378 [17:13:07<8:02:51, 12.28s/it] + +{'loss': 0.4081, 'learning_rate': 4.90021746670782e-06, 'epoch': 0.68} + + 68%|██████▊ | 5019/7378 [17:13:07<8:02:51, 12.28s/it] + 68%|██████▊ | 5020/7378 [17:13:19<8:06:05, 12.37s/it] + +{'loss': 0.4808, 'learning_rate': 4.896441605451998e-06, 'epoch': 0.68} + + 68%|██████▊ | 5020/7378 [17:13:19<8:06:05, 12.37s/it] + 68%|██████▊ | 5021/7378 [17:13:31<8:04:25, 12.33s/it] + +{'loss': 0.3819, 'learning_rate': 4.892666727827079e-06, 'epoch': 0.68} + + 68%|██████▊ | 5021/7378 [17:13:31<8:04:25, 12.33s/it] + 68%|██████▊ | 5022/7378 [17:13:44<8:04:33, 12.34s/it] + +{'loss': 0.433, 'learning_rate': 4.888892834560608e-06, 'epoch': 0.68} + + 68%|██████▊ | 5022/7378 [17:13:44<8:04:33, 12.34s/it] + 68%|██████▊ | 5023/7378 [17:13:56<8:01:08, 12.26s/it] + +{'loss': 0.4299, 'learning_rate': 4.885119926379943e-06, 'epoch': 0.68} + + 68%|██████▊ | 5023/7378 [17:13:56<8:01:08, 12.26s/it] + 68%|██████▊ | 5024/7378 [17:14:09<8:05:54, 12.39s/it] + +{'loss': 0.4623, 'learning_rate': 4.8813480040122526e-06, 'epoch': 0.68} + + 68%|██████▊ | 5024/7378 [17:14:09<8:05:54, 12.39s/it] + 68%|██████▊ | 5025/7378 [17:14:21<8:04:27, 12.35s/it] + +{'loss': 0.4211, 'learning_rate': 4.877577068184513e-06, 'epoch': 0.68} + + 68%|██████▊ | 5025/7378 [17:14:21<8:04:27, 12.35s/it] + 68%|██████▊ | 5026/7378 [17:14:33<8:04:05, 12.35s/it] + +{'loss': 0.4716, 'learning_rate': 4.873807119623521e-06, 'epoch': 0.68} + + 68%|██████▊ | 5026/7378 [17:14:33<8:04:05, 12.35s/it] + 68%|██████▊ | 5027/7378 [17:14:45<8:00:34, 12.26s/it] + +{'loss': 0.3737, 'learning_rate': 4.870038159055871e-06, 'epoch': 0.68} + + 68%|██████▊ | 5027/7378 [17:14:45<8:00:34, 12.26s/it] + 68%|██████▊ | 5028/7378 [17:14:58<8:07:00, 12.43s/it] + +{'loss': 0.4794, 'learning_rate': 4.86627018720797e-06, 'epoch': 0.68} + + 68%|██████▊ | 5028/7378 [17:14:58<8:07:00, 12.43s/it] + 68%|██████▊ | 5029/7378 [17:15:11<8:08:41, 12.48s/it] + +{'loss': 0.478, 'learning_rate': 4.862503204806031e-06, 'epoch': 0.68} + + 68%|██████▊ | 5029/7378 [17:15:11<8:08:41, 12.48s/it] + 68%|██████▊ | 5030/7378 [17:15:23<8:09:39, 12.51s/it] + +{'loss': 0.4115, 'learning_rate': 4.858737212576091e-06, 'epoch': 0.68} + + 68%|██████▊ | 5030/7378 [17:15:23<8:09:39, 12.51s/it] + 68%|██████▊ | 5031/7378 [17:15:35<8:01:47, 12.32s/it] + +{'loss': 0.4256, 'learning_rate': 4.854972211243981e-06, 'epoch': 0.68} + + 68%|██████▊ | 5031/7378 [17:15:35<8:01:47, 12.32s/it] + 68%|██████▊ | 5032/7378 [17:15:48<8:03:07, 12.36s/it] + +{'loss': 0.4727, 'learning_rate': 4.851208201535347e-06, 'epoch': 0.68} + + 68%|██████▊ | 5032/7378 [17:15:48<8:03:07, 12.36s/it] + 68%|██████▊ | 5033/7378 [17:16:00<8:03:42, 12.38s/it] + +{'loss': 0.4691, 'learning_rate': 4.84744518417564e-06, 'epoch': 0.68} + + 68%|██████▊ | 5033/7378 [17:16:00<8:03:42, 12.38s/it] + 68%|██████▊ | 5034/7378 [17:16:12<7:57:26, 12.22s/it] + +{'loss': 0.4186, 'learning_rate': 4.843683159890121e-06, 'epoch': 0.68} + + 68%|█████��▊ | 5034/7378 [17:16:12<7:57:26, 12.22s/it] + 68%|██████▊ | 5035/7378 [17:16:24<7:57:25, 12.23s/it] + +{'loss': 0.4281, 'learning_rate': 4.83992212940387e-06, 'epoch': 0.68} + + 68%|██████▊ | 5035/7378 [17:16:24<7:57:25, 12.23s/it] + 68%|██████▊ | 5036/7378 [17:16:36<7:51:23, 12.08s/it] + +{'loss': 0.4199, 'learning_rate': 4.83616209344176e-06, 'epoch': 0.68} + + 68%|██████▊ | 5036/7378 [17:16:36<7:51:23, 12.08s/it] + 68%|██████▊ | 5037/7378 [17:16:48<7:52:45, 12.12s/it] + +{'loss': 0.4731, 'learning_rate': 4.832403052728481e-06, 'epoch': 0.68} + + 68%|██████▊ | 5037/7378 [17:16:48<7:52:45, 12.12s/it] + 68%|██████▊ | 5038/7378 [17:17:00<7:49:12, 12.03s/it] + +{'loss': 0.4915, 'learning_rate': 4.828645007988524e-06, 'epoch': 0.68} + + 68%|██████▊ | 5038/7378 [17:17:00<7:49:12, 12.03s/it] + 68%|██████▊ | 5039/7378 [17:17:13<7:57:17, 12.24s/it] + +{'loss': 0.4075, 'learning_rate': 4.824887959946203e-06, 'epoch': 0.68} + + 68%|██████▊ | 5039/7378 [17:17:13<7:57:17, 12.24s/it] + 68%|██████▊ | 5040/7378 [17:17:25<7:55:11, 12.19s/it] + +{'loss': 0.4731, 'learning_rate': 4.821131909325624e-06, 'epoch': 0.68} + + 68%|██████▊ | 5040/7378 [17:17:25<7:55:11, 12.19s/it] + 68%|██████▊ | 5041/7378 [17:17:37<7:59:41, 12.32s/it] + +{'loss': 0.4445, 'learning_rate': 4.817376856850707e-06, 'epoch': 0.68} + + 68%|██████▊ | 5041/7378 [17:17:37<7:59:41, 12.32s/it] + 68%|██████▊ | 5042/7378 [17:17:50<8:01:43, 12.37s/it] + +{'loss': 0.4729, 'learning_rate': 4.813622803245181e-06, 'epoch': 0.68} + + 68%|██████▊ | 5042/7378 [17:17:50<8:01:43, 12.37s/it] + 68%|██████▊ | 5043/7378 [17:18:02<8:02:32, 12.40s/it] + +{'loss': 0.4794, 'learning_rate': 4.809869749232577e-06, 'epoch': 0.68} + + 68%|██████▊ | 5043/7378 [17:18:02<8:02:32, 12.40s/it] + 68%|██████▊ | 5044/7378 [17:18:14<7:59:36, 12.33s/it] + +{'loss': 0.4476, 'learning_rate': 4.806117695536241e-06, 'epoch': 0.68} + + 68%|██████▊ | 5044/7378 [17:18:14<7:59:36, 12.33s/it] + 68%|██████▊ | 5045/7378 [17:18:26<7:55:10, 12.22s/it] + +{'loss': 0.4393, 'learning_rate': 4.802366642879326e-06, 'epoch': 0.68} + + 68%|██████▊ | 5045/7378 [17:18:26<7:55:10, 12.22s/it] + 68%|██████▊ | 5046/7378 [17:18:39<7:54:41, 12.21s/it] + +{'loss': 0.5126, 'learning_rate': 4.798616591984784e-06, 'epoch': 0.68} + + 68%|██████▊ | 5046/7378 [17:18:39<7:54:41, 12.21s/it] + 68%|██████▊ | 5047/7378 [17:18:51<7:57:39, 12.29s/it] + +{'loss': 0.4069, 'learning_rate': 4.79486754357538e-06, 'epoch': 0.68} + + 68%|██████▊ | 5047/7378 [17:18:51<7:57:39, 12.29s/it] + 68%|██████▊ | 5048/7378 [17:19:03<7:53:29, 12.19s/it] + +{'loss': 0.4866, 'learning_rate': 4.791119498373683e-06, 'epoch': 0.68} + + 68%|██████▊ | 5048/7378 [17:19:03<7:53:29, 12.19s/it] + 68%|██████▊ | 5049/7378 [17:19:16<8:02:37, 12.43s/it] + +{'loss': 0.5007, 'learning_rate': 4.787372457102067e-06, 'epoch': 0.68} + + 68%|██████▊ | 5049/7378 [17:19:16<8:02:37, 12.43s/it] + 68%|██████▊ | 5050/7378 [17:19:28<8:02:09, 12.43s/it] + +{'loss': 0.4974, 'learning_rate': 4.783626420482724e-06, 'epoch': 0.68} + + 68%|██████▊ | 5050/7378 [17:19:28<8:02:09, 12.43s/it] + 68%|██████▊ | 5051/7378 [17:19:41<8:01:24, 12.41s/it] + +{'loss': 0.4616, 'learning_rate': 4.779881389237638e-06, 'epoch': 0.68} + + 68%|██████▊ | 5051/7378 [17:19:41<8:01:24, 12.41s/it] + 68%|██████▊ | 5052/7378 [17:19:53<8:01:49, 12.43s/it] + +{'loss': 0.5345, 'learning_rate': 4.776137364088608e-06, 'epoch': 0.68} + + 68%|██████▊ | 5052/7378 [17:19:53<8:01:49, 12.43s/it] + 68%|██████▊ | 5053/7378 [17:20:06<8:02:42, 12.46s/it] + +{'loss': 0.4255, 'learning_rate': 4.772394345757228e-06, 'epoch': 0.68} + + 68%|██████▊ | 5053/7378 [17:20:06<8:02:42, 12.46s/it] + 69%|██████▊ | 5054/7378 [17:20:17<7:53:43, 12.23s/it] + +{'loss': 0.418, 'learning_rate': 4.768652334964919e-06, 'epoch': 0.69} + + 69%|██████▊ | 5054/7378 [17:20:18<7:53:43, 12.23s/it] + 69%|██████▊ | 5055/7378 [17:20:30<7:56:29, 12.31s/it] + +{'loss': 0.478, 'learning_rate': 4.7649113324328854e-06, 'epoch': 0.69} + + 69%|██████▊ | 5055/7378 [17:20:30<7:56:29, 12.31s/it] + 69%|██████▊ | 5056/7378 [17:20:42<7:52:20, 12.21s/it] + +{'loss': 0.4325, 'learning_rate': 4.761171338882151e-06, 'epoch': 0.69} + + 69%|██████▊ | 5056/7378 [17:20:42<7:52:20, 12.21s/it] + 69%|██████▊ | 5057/7378 [17:20:55<8:00:00, 12.41s/it] + +{'loss': 0.4817, 'learning_rate': 4.75743235503354e-06, 'epoch': 0.69} + + 69%|██████▊ | 5057/7378 [17:20:55<8:00:00, 12.41s/it] + 69%|██████▊ | 5058/7378 [17:21:07<7:57:35, 12.35s/it] + +{'loss': 0.4806, 'learning_rate': 4.753694381607679e-06, 'epoch': 0.69} + + 69%|██████▊ | 5058/7378 [17:21:07<7:57:35, 12.35s/it] + 69%|██████▊ | 5059/7378 [17:21:19<7:56:07, 12.32s/it] + +{'loss': 0.3845, 'learning_rate': 4.74995741932501e-06, 'epoch': 0.69} + + 69%|██████▊ | 5059/7378 [17:21:19<7:56:07, 12.32s/it] + 69%|██████▊ | 5060/7378 [17:21:31<7:54:05, 12.27s/it] + +{'loss': 0.3952, 'learning_rate': 4.746221468905773e-06, 'epoch': 0.69} + + 69%|██████▊ | 5060/7378 [17:21:31<7:54:05, 12.27s/it] + 69%|██████▊ | 5061/7378 [17:21:44<7:52:58, 12.25s/it] + +{'loss': 0.3891, 'learning_rate': 4.742486531070011e-06, 'epoch': 0.69} + + 69%|██████▊ | 5061/7378 [17:21:44<7:52:58, 12.25s/it] + 69%|██████▊ | 5062/7378 [17:21:56<7:50:49, 12.20s/it] + +{'loss': 0.4395, 'learning_rate': 4.7387526065375725e-06, 'epoch': 0.69} + + 69%|██████▊ | 5062/7378 [17:21:56<7:50:49, 12.20s/it] + 69%|██████▊ | 5063/7378 [17:22:08<7:50:11, 12.19s/it] + +{'loss': 0.3828, 'learning_rate': 4.7350196960281205e-06, 'epoch': 0.69} + + 69%|██████▊ | 5063/7378 [17:22:08<7:50:11, 12.19s/it] + 69%|██████▊ | 5064/7378 [17:22:20<7:46:57, 12.11s/it] + +{'loss': 0.4506, 'learning_rate': 4.73128780026111e-06, 'epoch': 0.69} + + 69%|██████▊ | 5064/7378 [17:22:20<7:46:57, 12.11s/it] + 69%|██████▊ | 5065/7378 [17:22:32<7:48:33, 12.15s/it] + +{'loss': 0.472, 'learning_rate': 4.727556919955808e-06, 'epoch': 0.69} + + 69%|██████▊ | 5065/7378 [17:22:32<7:48:33, 12.15s/it] + 69%|██████▊ | 5066/7378 [17:22:44<7:49:07, 12.17s/it] + +{'loss': 0.4436, 'learning_rate': 4.723827055831281e-06, 'epoch': 0.69} + + 69%|██████▊ | 5066/7378 [17:22:44<7:49:07, 12.17s/it] + 69%|██████▊ | 5067/7378 [17:22:57<7:50:30, 12.22s/it] + +{'loss': 0.4509, 'learning_rate': 4.720098208606397e-06, 'epoch': 0.69} + + 69%|██████▊ | 5067/7378 [17:22:57<7:50:30, 12.22s/it] + 69%|██████▊ | 5068/7378 [17:23:09<7:50:08, 12.21s/it] + +{'loss': 0.441, 'learning_rate': 4.716370378999844e-06, 'epoch': 0.69} + + 69%|██████▊ | 5068/7378 [17:23:09<7:50:08, 12.21s/it] + 69%|██████▊ | 5069/7378 [17:23:21<7:48:44, 12.18s/it] + +{'loss': 0.4697, 'learning_rate': 4.712643567730096e-06, 'epoch': 0.69} + + 69%|██████▊ | 5069/7378 [17:23:21<7:48:44, 12.18s/it] + 69%|██████▊ | 5070/7378 [17:23:33<7:52:14, 12.28s/it] + +{'loss': 0.4325, 'learning_rate': 4.708917775515439e-06, 'epoch': 0.69} + + 69%|██████▊ | 5070/7378 [17:23:33<7:52:14, 12.28s/it] + 69%|██████▊ | 5071/7378 [17:23:46<7:53:22, 12.31s/it] + +{'loss': 0.4436, 'learning_rate': 4.7051930030739566e-06, 'epoch': 0.69} + + 69%|██████▊ | 5071/7378 [17:23:46<7:53:22, 12.31s/it] + 69%|██████▊ | 5072/7378 [17:23:58<7:48:22, 12.19s/it] + +{'loss': 0.3988, 'learning_rate': 4.701469251123548e-06, 'epoch': 0.69} + + 69%|██████▊ | 5072/7378 [17:23:58<7:48:22, 12.19s/it] + 69%|██████▉ | 5073/7378 [17:24:10<7:45:30, 12.12s/it] + +{'loss': 0.4251, 'learning_rate': 4.6977465203819036e-06, 'epoch': 0.69} + + 69%|██████▉ | 5073/7378 [17:24:10<7:45:30, 12.12s/it] + 69%|██████▉ | 5074/7378 [17:24:22<7:48:46, 12.21s/it] + +{'loss': 0.4246, 'learning_rate': 4.694024811566523e-06, 'epoch': 0.69} + + 69%|██████▉ | 5074/7378 [17:24:22<7:48:46, 12.21s/it] + 69%|██████▉ | 5075/7378 [17:24:35<7:59:08, 12.48s/it] + +{'loss': 0.387, 'learning_rate': 4.690304125394707e-06, 'epoch': 0.69} + + 69%|██████▉ | 5075/7378 [17:24:35<7:59:08, 12.48s/it] + 69%|██████▉ | 5076/7378 [17:24:47<7:56:47, 12.43s/it] + +{'loss': 0.4547, 'learning_rate': 4.686584462583555e-06, 'epoch': 0.69} + + 69%|██████▉ | 5076/7378 [17:24:48<7:56:47, 12.43s/it] + 69%|██████▉ | 5077/7378 [17:25:00<7:54:19, 12.37s/it] + +{'loss': 0.4649, 'learning_rate': 4.6828658238499805e-06, 'epoch': 0.69} + + 69%|██████▉ | 5077/7378 [17:25:00<7:54:19, 12.37s/it] + 69%|██████▉ | 5078/7378 [17:25:12<7:52:08, 12.32s/it] + +{'loss': 0.5265, 'learning_rate': 4.679148209910689e-06, 'epoch': 0.69} + + 69%|██████▉ | 5078/7378 [17:25:12<7:52:08, 12.32s/it] + 69%|██████▉ | 5079/7378 [17:25:24<7:48:10, 12.22s/it] + +{'loss': 0.448, 'learning_rate': 4.675431621482195e-06, 'epoch': 0.69} + + 69%|██████▉ | 5079/7378 [17:25:24<7:48:10, 12.22s/it] + 69%|██████�� | 5080/7378 [17:25:36<7:49:25, 12.26s/it] + +{'loss': 0.4614, 'learning_rate': 4.671716059280806e-06, 'epoch': 0.69} + + 69%|██████▉ | 5080/7378 [17:25:36<7:49:25, 12.26s/it] + 69%|██████▉ | 5081/7378 [17:25:49<7:51:03, 12.30s/it] + +{'loss': 0.4234, 'learning_rate': 4.668001524022648e-06, 'epoch': 0.69} + + 69%|██████▉ | 5081/7378 [17:25:49<7:51:03, 12.30s/it] + 69%|██████▉ | 5082/7378 [17:26:01<7:46:02, 12.18s/it] + +{'loss': 0.4389, 'learning_rate': 4.664288016423635e-06, 'epoch': 0.69} + + 69%|██████▉ | 5082/7378 [17:26:01<7:46:02, 12.18s/it] + 69%|██████▉ | 5083/7378 [17:26:13<7:43:11, 12.11s/it] + +{'loss': 0.4402, 'learning_rate': 4.660575537199487e-06, 'epoch': 0.69} + + 69%|██████▉ | 5083/7378 [17:26:13<7:43:11, 12.11s/it] + 69%|██████▉ | 5084/7378 [17:26:25<7:46:15, 12.20s/it] + +{'loss': 0.4497, 'learning_rate': 4.656864087065726e-06, 'epoch': 0.69} + + 69%|██████▉ | 5084/7378 [17:26:25<7:46:15, 12.20s/it] + 69%|██████▉ | 5085/7378 [17:26:38<7:50:58, 12.32s/it] + +{'loss': 0.4853, 'learning_rate': 4.653153666737672e-06, 'epoch': 0.69} + + 69%|██████▉ | 5085/7378 [17:26:38<7:50:58, 12.32s/it] + 69%|██████▉ | 5086/7378 [17:26:50<7:47:57, 12.25s/it] + +{'loss': 0.4892, 'learning_rate': 4.649444276930458e-06, 'epoch': 0.69} + + 69%|██████▉ | 5086/7378 [17:26:50<7:47:57, 12.25s/it] + 69%|██████▉ | 5087/7378 [17:27:02<7:48:51, 12.28s/it] + +{'loss': 0.4506, 'learning_rate': 4.645735918359009e-06, 'epoch': 0.69} + + 69%|██████▉ | 5087/7378 [17:27:02<7:48:51, 12.28s/it] + 69%|██████▉ | 5088/7378 [17:27:14<7:50:02, 12.32s/it] + +{'loss': 0.4455, 'learning_rate': 4.642028591738046e-06, 'epoch': 0.69} + + 69%|██████▉ | 5088/7378 [17:27:14<7:50:02, 12.32s/it] + 69%|██████▉ | 5089/7378 [17:27:27<7:57:20, 12.51s/it] + +{'loss': 0.5019, 'learning_rate': 4.638322297782109e-06, 'epoch': 0.69} + + 69%|██████▉ | 5089/7378 [17:27:27<7:57:20, 12.51s/it] + 69%|██████▉ | 5090/7378 [17:27:40<7:54:10, 12.43s/it] + +{'loss': 0.4133, 'learning_rate': 4.634617037205517e-06, 'epoch': 0.69} + + 69%|██████▉ | 5090/7378 [17:27:40<7:54:10, 12.43s/it] + 69%|██████▉ | 5091/7378 [17:27:52<7:50:19, 12.34s/it] + +{'loss': 0.4559, 'learning_rate': 4.630912810722411e-06, 'epoch': 0.69} + + 69%|██████▉ | 5091/7378 [17:27:52<7:50:19, 12.34s/it] + 69%|██████▉ | 5092/7378 [17:28:04<7:51:10, 12.37s/it] + +{'loss': 0.4359, 'learning_rate': 4.627209619046718e-06, 'epoch': 0.69} + + 69%|██████▉ | 5092/7378 [17:28:04<7:51:10, 12.37s/it] + 69%|██████▉ | 5093/7378 [17:28:16<7:47:06, 12.27s/it] + +{'loss': 0.4767, 'learning_rate': 4.6235074628921705e-06, 'epoch': 0.69} + + 69%|██████▉ | 5093/7378 [17:28:16<7:47:06, 12.27s/it] + 69%|██████▉ | 5094/7378 [17:28:28<7:44:20, 12.20s/it] + +{'loss': 0.4353, 'learning_rate': 4.6198063429722995e-06, 'epoch': 0.69} + + 69%|██████▉ | 5094/7378 [17:28:28<7:44:20, 12.20s/it] + 69%|██████▉ | 5095/7378 [17:28:41<7:45:31, 12.23s/it] + +{'loss': 0.3994, 'learning_rate': 4.616106260000437e-06, 'epoch': 0.69} + + 69%|██████▉ | 5095/7378 [17:28:41<7:45:31, 12.23s/it] + 69%|██████▉ | 5096/7378 [17:28:52<7:41:33, 12.14s/it] + +{'loss': 0.4704, 'learning_rate': 4.612407214689721e-06, 'epoch': 0.69} + + 69%|██████▉ | 5096/7378 [17:28:52<7:41:33, 12.14s/it] + 69%|██████▉ | 5097/7378 [17:29:05<7:42:03, 12.15s/it] + +{'loss': 0.4034, 'learning_rate': 4.608709207753081e-06, 'epoch': 0.69} + + 69%|██████▉ | 5097/7378 [17:29:05<7:42:03, 12.15s/it] + 69%|██████▉ | 5098/7378 [17:29:17<7:44:41, 12.23s/it] + +{'loss': 0.4158, 'learning_rate': 4.605012239903253e-06, 'epoch': 0.69} + + 69%|██████▉ | 5098/7378 [17:29:17<7:44:41, 12.23s/it] + 69%|██████▉ | 5099/7378 [17:29:29<7:43:51, 12.21s/it] + +{'loss': 0.4894, 'learning_rate': 4.601316311852761e-06, 'epoch': 0.69} + + 69%|██████▉ | 5099/7378 [17:29:29<7:43:51, 12.21s/it] + 69%|██████▉ | 5100/7378 [17:29:42<7:47:17, 12.31s/it] + +{'loss': 0.483, 'learning_rate': 4.597621424313948e-06, 'epoch': 0.69} + + 69%|██████▉ | 5100/7378 [17:29:42<7:47:17, 12.31s/it] + 69%|██████▉ | 5101/7378 [17:29:54<7:44:13, 12.23s/it] + +{'loss': 0.3778, 'learning_rate': 4.593927577998941e-06, 'epoch': 0.69} + + 69%|██████▉ | 5101/7378 [17:29:54<7:44:13, 12.23s/it] + 69%|██████▉ | 5102/7378 [17:30:06<7:46:15, 12.29s/it] + +{'loss': 0.4738, 'learning_rate': 4.590234773619671e-06, 'epoch': 0.69} + + 69%|████��█▉ | 5102/7378 [17:30:06<7:46:15, 12.29s/it] + 69%|██████▉ | 5103/7378 [17:30:19<7:46:23, 12.30s/it] + +{'loss': 0.4334, 'learning_rate': 4.586543011887869e-06, 'epoch': 0.69} + + 69%|██████▉ | 5103/7378 [17:30:19<7:46:23, 12.30s/it] + 69%|██████▉ | 5104/7378 [17:30:31<7:47:32, 12.34s/it] + +{'loss': 0.4121, 'learning_rate': 4.582852293515057e-06, 'epoch': 0.69} + + 69%|██████▉ | 5104/7378 [17:30:31<7:47:32, 12.34s/it] + 69%|██████▉ | 5105/7378 [17:30:44<7:49:51, 12.40s/it] + +{'loss': 0.476, 'learning_rate': 4.579162619212576e-06, 'epoch': 0.69} + + 69%|██████▉ | 5105/7378 [17:30:44<7:49:51, 12.40s/it] + 69%|██████▉ | 5106/7378 [17:30:56<7:47:44, 12.35s/it] + +{'loss': 0.4304, 'learning_rate': 4.575473989691546e-06, 'epoch': 0.69} + + 69%|██████▉ | 5106/7378 [17:30:56<7:47:44, 12.35s/it] + 69%|██████▉ | 5107/7378 [17:31:08<7:43:26, 12.24s/it] + +{'loss': 0.4493, 'learning_rate': 4.571786405662893e-06, 'epoch': 0.69} + + 69%|██████▉ | 5107/7378 [17:31:08<7:43:26, 12.24s/it] + 69%|██████▉ | 5108/7378 [17:31:20<7:47:32, 12.36s/it] + +{'loss': 0.435, 'learning_rate': 4.56809986783734e-06, 'epoch': 0.69} + + 69%|██████▉ | 5108/7378 [17:31:20<7:47:32, 12.36s/it] + 69%|██████▉ | 5109/7378 [17:31:32<7:44:18, 12.28s/it] + +{'loss': 0.4601, 'learning_rate': 4.564414376925407e-06, 'epoch': 0.69} + + 69%|██████▉ | 5109/7378 [17:31:32<7:44:18, 12.28s/it] + 69%|██████▉ | 5110/7378 [17:31:45<7:42:47, 12.24s/it] + +{'loss': 0.4105, 'learning_rate': 4.560729933637422e-06, 'epoch': 0.69} + + 69%|██████▉ | 5110/7378 [17:31:45<7:42:47, 12.24s/it] + 69%|██████▉ | 5111/7378 [17:31:57<7:43:27, 12.27s/it] + +{'loss': 0.4391, 'learning_rate': 4.5570465386834995e-06, 'epoch': 0.69} + + 69%|██████▉ | 5111/7378 [17:31:57<7:43:27, 12.27s/it] + 69%|██████▉ | 5112/7378 [17:32:09<7:40:50, 12.20s/it] + +{'loss': 0.3848, 'learning_rate': 4.553364192773556e-06, 'epoch': 0.69} + + 69%|██████▉ | 5112/7378 [17:32:09<7:40:50, 12.20s/it] + 69%|██████▉ | 5113/7378 [17:32:21<7:41:58, 12.24s/it] + +{'loss': 0.4401, 'learning_rate': 4.549682896617304e-06, 'epoch': 0.69} + + 69%|██████▉ | 5113/7378 [17:32:21<7:41:58, 12.24s/it] + 69%|██████▉ | 5114/7378 [17:32:33<7:39:08, 12.17s/it] + +{'loss': 0.5028, 'learning_rate': 4.546002650924261e-06, 'epoch': 0.69} + + 69%|██████▉ | 5114/7378 [17:32:33<7:39:08, 12.17s/it] + 69%|██████▉ | 5115/7378 [17:32:45<7:35:46, 12.08s/it] + +{'loss': 0.4651, 'learning_rate': 4.542323456403733e-06, 'epoch': 0.69} + + 69%|██████▉ | 5115/7378 [17:32:45<7:35:46, 12.08s/it] + 69%|██████▉ | 5116/7378 [17:32:57<7:35:54, 12.09s/it] + +{'loss': 0.4127, 'learning_rate': 4.538645313764828e-06, 'epoch': 0.69} + + 69%|██████▉ | 5116/7378 [17:32:57<7:35:54, 12.09s/it] + 69%|██████▉ | 5117/7378 [17:33:10<7:39:25, 12.19s/it] + +{'loss': 0.45, 'learning_rate': 4.53496822371645e-06, 'epoch': 0.69} + + 69%|██████▉ | 5117/7378 [17:33:10<7:39:25, 12.19s/it] + 69%|██████▉ | 5118/7378 [17:33:22<7:40:42, 12.23s/it] + +{'loss': 0.5179, 'learning_rate': 4.531292186967298e-06, 'epoch': 0.69} + + 69%|██████▉ | 5118/7378 [17:33:22<7:40:42, 12.23s/it] + 69%|██████▉ | 5119/7378 [17:33:34<7:41:43, 12.26s/it] + +{'loss': 0.3617, 'learning_rate': 4.527617204225875e-06, 'epoch': 0.69} + + 69%|██████▉ | 5119/7378 [17:33:34<7:41:43, 12.26s/it] + 69%|██████▉ | 5120/7378 [17:33:47<7:44:23, 12.34s/it] + +{'loss': 0.4198, 'learning_rate': 4.523943276200476e-06, 'epoch': 0.69} + + 69%|██████▉ | 5120/7378 [17:33:47<7:44:23, 12.34s/it] + 69%|██████▉ | 5121/7378 [17:33:59<7:43:21, 12.32s/it] + +{'loss': 0.3806, 'learning_rate': 4.5202704035991895e-06, 'epoch': 0.69} + + 69%|██████▉ | 5121/7378 [17:33:59<7:43:21, 12.32s/it] + 69%|██████▉ | 5122/7378 [17:34:11<7:39:57, 12.23s/it] + +{'loss': 0.4312, 'learning_rate': 4.5165985871299045e-06, 'epoch': 0.69} + + 69%|██████▉ | 5122/7378 [17:34:11<7:39:57, 12.23s/it] + 69%|██████▉ | 5123/7378 [17:34:24<7:41:03, 12.27s/it] + +{'loss': 0.4467, 'learning_rate': 4.51292782750031e-06, 'epoch': 0.69} + + 69%|██████▉ | 5123/7378 [17:34:24<7:41:03, 12.27s/it] + 69%|██████▉ | 5124/7378 [17:34:36<7:40:25, 12.26s/it] + +{'loss': 0.3776, 'learning_rate': 4.509258125417886e-06, 'epoch': 0.69} + + 69%|██████▉ | 5124/7378 [17:34:36<7:40:25, 12.26s/it] + 69%|██████▉ | 5125/7378 [17:34:48<7:43:08, 12.33s/it] + +{'loss': 0.4118, 'learning_rate': 4.5055894815899084e-06, 'epoch': 0.69} + + 69%|██████▉ | 5125/7378 [17:34:48<7:43:08, 12.33s/it] + 69%|██████▉ | 5126/7378 [17:35:00<7:38:41, 12.22s/it] + +{'loss': 0.4754, 'learning_rate': 4.5019218967234515e-06, 'epoch': 0.69} + + 69%|██████▉ | 5126/7378 [17:35:00<7:38:41, 12.22s/it] + 69%|██████▉ | 5127/7378 [17:35:13<7:39:36, 12.25s/it] + +{'loss': 0.4203, 'learning_rate': 4.4982553715253804e-06, 'epoch': 0.69} + + 69%|██████▉ | 5127/7378 [17:35:13<7:39:36, 12.25s/it] + 70%|██████▉ | 5128/7378 [17:35:25<7:40:32, 12.28s/it] + +{'loss': 0.3864, 'learning_rate': 4.494589906702369e-06, 'epoch': 0.7} + + 70%|██████▉ | 5128/7378 [17:35:25<7:40:32, 12.28s/it] + 70%|██████▉ | 5129/7378 [17:35:38<7:47:37, 12.48s/it] + +{'loss': 0.4279, 'learning_rate': 4.490925502960874e-06, 'epoch': 0.7} + + 70%|██████▉ | 5129/7378 [17:35:38<7:47:37, 12.48s/it] + 70%|██████▉ | 5130/7378 [17:35:50<7:45:31, 12.43s/it] + +{'loss': 0.4589, 'learning_rate': 4.487262161007153e-06, 'epoch': 0.7} + + 70%|██████▉ | 5130/7378 [17:35:50<7:45:31, 12.43s/it] + 70%|██████▉ | 5131/7378 [17:36:03<7:46:27, 12.46s/it] + +{'loss': 0.4108, 'learning_rate': 4.4835998815472515e-06, 'epoch': 0.7} + + 70%|██████▉ | 5131/7378 [17:36:03<7:46:27, 12.46s/it] + 70%|██████▉ | 5132/7378 [17:36:15<7:45:28, 12.43s/it] + +{'loss': 0.4379, 'learning_rate': 4.479938665287021e-06, 'epoch': 0.7} + + 70%|██████▉ | 5132/7378 [17:36:15<7:45:28, 12.43s/it] + 70%|██████▉ | 5133/7378 [17:36:28<7:45:33, 12.44s/it] + +{'loss': 0.4461, 'learning_rate': 4.47627851293211e-06, 'epoch': 0.7} + + 70%|██████▉ | 5133/7378 [17:36:28<7:45:33, 12.44s/it] + 70%|██████▉ | 5134/7378 [17:36:40<7:43:53, 12.40s/it] + +{'loss': 0.457, 'learning_rate': 4.472619425187947e-06, 'epoch': 0.7} + + 70%|██████▉ | 5134/7378 [17:36:40<7:43:53, 12.40s/it] + 70%|██████▉ | 5135/7378 [17:36:52<7:39:26, 12.29s/it] + +{'loss': 0.475, 'learning_rate': 4.4689614027597685e-06, 'epoch': 0.7} + + 70%|██████▉ | 5135/7378 [17:36:52<7:39:26, 12.29s/it] + 70%|██████▉ | 5136/7378 [17:37:04<7:36:31, 12.22s/it] + +{'loss': 0.4653, 'learning_rate': 4.4653044463525975e-06, 'epoch': 0.7} + + 70%|██████▉ | 5136/7378 [17:37:04<7:36:31, 12.22s/it] + 70%|██████▉ | 5137/7378 [17:37:16<7:35:05, 12.18s/it] + +{'loss': 0.4186, 'learning_rate': 4.4616485566712534e-06, 'epoch': 0.7} + + 70%|██████▉ | 5137/7378 [17:37:16<7:35:05, 12.18s/it] + 70%|██████▉ | 5138/7378 [17:37:28<7:35:26, 12.20s/it] + +{'loss': 0.4502, 'learning_rate': 4.457993734420357e-06, 'epoch': 0.7} + + 70%|██████▉ | 5138/7378 [17:37:28<7:35:26, 12.20s/it] + 70%|██████▉ | 5139/7378 [17:37:41<7:39:05, 12.30s/it] + +{'loss': 0.4374, 'learning_rate': 4.454339980304317e-06, 'epoch': 0.7} + + 70%|██████▉ | 5139/7378 [17:37:41<7:39:05, 12.30s/it] + 70%|██████▉ | 5140/7378 [17:37:53<7:33:53, 12.17s/it] + +{'loss': 0.4308, 'learning_rate': 4.450687295027335e-06, 'epoch': 0.7} + + 70%|██████▉ | 5140/7378 [17:37:53<7:33:53, 12.17s/it] + 70%|██████▉ | 5141/7378 [17:38:05<7:30:30, 12.08s/it] + +{'loss': 0.3965, 'learning_rate': 4.447035679293407e-06, 'epoch': 0.7} + + 70%|██████▉ | 5141/7378 [17:38:05<7:30:30, 12.08s/it] + 70%|██████▉ | 5142/7378 [17:38:17<7:29:39, 12.07s/it] + +{'loss': 0.4659, 'learning_rate': 4.44338513380633e-06, 'epoch': 0.7} + + 70%|██████▉ | 5142/7378 [17:38:17<7:29:39, 12.07s/it] + 70%|██████▉ | 5143/7378 [17:38:29<7:32:14, 12.14s/it] + +{'loss': 0.4948, 'learning_rate': 4.439735659269688e-06, 'epoch': 0.7} + + 70%|██████▉ | 5143/7378 [17:38:29<7:32:14, 12.14s/it] + 70%|██████▉ | 5144/7378 [17:38:42<7:39:54, 12.35s/it] + +{'loss': 0.4048, 'learning_rate': 4.436087256386859e-06, 'epoch': 0.7} + + 70%|██████▉ | 5144/7378 [17:38:42<7:39:54, 12.35s/it] + 70%|██████▉ | 5145/7378 [17:38:55<7:45:15, 12.50s/it] + +{'loss': 0.4645, 'learning_rate': 4.432439925861015e-06, 'epoch': 0.7} + + 70%|██████▉ | 5145/7378 [17:38:55<7:45:15, 12.50s/it] + 70%|██████▉ | 5146/7378 [17:39:07<7:47:11, 12.56s/it] + +{'loss': 0.3806, 'learning_rate': 4.428793668395118e-06, 'epoch': 0.7} + + 70%|██████▉ | 5146/7378 [17:39:07<7:47:11, 12.56s/it] + 70%|██████▉ | 5147/7378 [17:39:20<7:46:44, 12.55s/it] + +{'loss': 0.4205, 'learning_rate': 4.425148484691936e-06, 'epoch': 0.7} + + 70%|██████▉ | 5147/7378 [17:39:20<7:46:44, 12.55s/it] + 70%|██████▉ | 5148/7378 [17:39:32<7:42:21, 12.44s/it] + +{'loss': 0.4307, 'learning_rate': 4.421504375454016e-06, 'epoch': 0.7} + + 70%|██████▉ | 5148/7378 [17:39:32<7:42:21, 12.44s/it] + 70%|██████▉ | 5149/7378 [17:39:44<7:40:17, 12.39s/it] + +{'loss': 0.4508, 'learning_rate': 4.417861341383702e-06, 'epoch': 0.7} + + 70%|██████▉ | 5149/7378 [17:39:44<7:40:17, 12.39s/it] + 70%|██████▉ | 5150/7378 [17:39:57<7:40:21, 12.40s/it] + +{'loss': 0.4223, 'learning_rate': 4.414219383183129e-06, 'epoch': 0.7} + + 70%|██████▉ | 5150/7378 [17:39:57<7:40:21, 12.40s/it] + 70%|██████▉ | 5151/7378 [17:40:09<7:39:04, 12.37s/it] + +{'loss': 0.3969, 'learning_rate': 4.410578501554236e-06, 'epoch': 0.7} + + 70%|██████▉ | 5151/7378 [17:40:09<7:39:04, 12.37s/it] + 70%|██████▉ | 5152/7378 [17:40:21<7:35:26, 12.28s/it] + +{'loss': 0.4934, 'learning_rate': 4.406938697198741e-06, 'epoch': 0.7} + + 70%|██████▉ | 5152/7378 [17:40:21<7:35:26, 12.28s/it] + 70%|██████▉ | 5153/7378 [17:40:33<7:37:06, 12.33s/it] + +{'loss': 0.4351, 'learning_rate': 4.403299970818159e-06, 'epoch': 0.7} + + 70%|██████▉ | 5153/7378 [17:40:34<7:37:06, 12.33s/it] + 70%|██████▉ | 5154/7378 [17:40:46<7:34:47, 12.27s/it] + +{'loss': 0.4521, 'learning_rate': 4.399662323113798e-06, 'epoch': 0.7} + + 70%|██████▉ | 5154/7378 [17:40:46<7:34:47, 12.27s/it] + 70%|██████▉ | 5155/7378 [17:40:58<7:38:16, 12.37s/it] + +{'loss': 0.4678, 'learning_rate': 4.396025754786755e-06, 'epoch': 0.7} + + 70%|██████▉ | 5155/7378 [17:40:58<7:38:16, 12.37s/it] + 70%|██████▉ | 5156/7378 [17:41:11<7:39:20, 12.40s/it] + +{'loss': 0.4223, 'learning_rate': 4.392390266537926e-06, 'epoch': 0.7} + + 70%|██████▉ | 5156/7378 [17:41:11<7:39:20, 12.40s/it] + 70%|██████▉ | 5157/7378 [17:41:23<7:41:20, 12.46s/it] + +{'loss': 0.4039, 'learning_rate': 4.3887558590679925e-06, 'epoch': 0.7} + + 70%|██████▉ | 5157/7378 [17:41:23<7:41:20, 12.46s/it] + 70%|██████▉ | 5158/7378 [17:41:36<7:40:52, 12.46s/it] + +{'loss': 0.3865, 'learning_rate': 4.385122533077429e-06, 'epoch': 0.7} + + 70%|██████▉ | 5158/7378 [17:41:36<7:40:52, 12.46s/it] + 70%|██████▉ | 5159/7378 [17:41:48<7:39:40, 12.43s/it] + +{'loss': 0.4325, 'learning_rate': 4.381490289266505e-06, 'epoch': 0.7} + + 70%|██████▉ | 5159/7378 [17:41:48<7:39:40, 12.43s/it] + 70%|██████▉ | 5160/7378 [17:42:00<7:35:35, 12.32s/it] + +{'loss': 0.4961, 'learning_rate': 4.37785912833527e-06, 'epoch': 0.7} + + 70%|██████▉ | 5160/7378 [17:42:00<7:35:35, 12.32s/it] + 70%|██████▉ | 5161/7378 [17:42:12<7:33:59, 12.29s/it] + +{'loss': 0.4092, 'learning_rate': 4.374229050983585e-06, 'epoch': 0.7} + + 70%|██████▉ | 5161/7378 [17:42:12<7:33:59, 12.29s/it] + 70%|██████▉ | 5162/7378 [17:42:24<7:29:15, 12.16s/it] + +{'loss': 0.4272, 'learning_rate': 4.370600057911084e-06, 'epoch': 0.7} + + 70%|██████▉ | 5162/7378 [17:42:24<7:29:15, 12.16s/it] + 70%|██████▉ | 5163/7378 [17:42:36<7:27:07, 12.11s/it] + +{'loss': 0.4631, 'learning_rate': 4.366972149817199e-06, 'epoch': 0.7} + + 70%|██████▉ | 5163/7378 [17:42:36<7:27:07, 12.11s/it] + 70%|██████▉ | 5164/7378 [17:42:49<7:28:25, 12.15s/it] + +{'loss': 0.4818, 'learning_rate': 4.3633453274011506e-06, 'epoch': 0.7} + + 70%|██████▉ | 5164/7378 [17:42:49<7:28:25, 12.15s/it] + 70%|███████ | 5165/7378 [17:43:01<7:29:11, 12.18s/it] + +{'loss': 0.4447, 'learning_rate': 4.359719591361957e-06, 'epoch': 0.7} + + 70%|███████ | 5165/7378 [17:43:01<7:29:11, 12.18s/it] + 70%|███████ | 5166/7378 [17:43:13<7:34:31, 12.33s/it] + +{'loss': 0.4418, 'learning_rate': 4.356094942398421e-06, 'epoch': 0.7} + + 70%|███████ | 5166/7378 [17:43:13<7:34:31, 12.33s/it] + 70%|███████ | 5167/7378 [17:43:26<7:36:46, 12.40s/it] + +{'loss': 0.4033, 'learning_rate': 4.352471381209134e-06, 'epoch': 0.7} + + 70%|███████ | 5167/7378 [17:43:26<7:36:46, 12.40s/it] + 70%|███████ | 5168/7378 [17:43:38<7:36:26, 12.39s/it] + +{'loss': 0.4278, 'learning_rate': 4.3488489084924825e-06, 'epoch': 0.7} + + 70%|███████ | 5168/7378 [17:43:38<7:36:26, 12.39s/it] + 70%|███████ | 5169/7378 [17:43:51<7:37:56, 12.44s/it] + +{'loss': 0.4874, 'learning_rate': 4.345227524946637e-06, 'epoch': 0.7} + + 70%|███████ | 5169/7378 [17:43:51<7:37:56, 12.44s/it] + 70%|███████ | 5170/7378 [17:44:04<7:40:39, 12.52s/it] + +{'loss': 0.4718, 'learning_rate': 4.341607231269569e-06, 'epoch': 0.7} + + 70%|███████ | 5170/7378 [17:44:04<7:40:39, 12.52s/it] + 70%|███████ | 5171/7378 [17:44:16<7:37:45, 12.44s/it] + +{'loss': 0.4796, 'learning_rate': 4.337988028159031e-06, 'epoch': 0.7} + + 70%|███████ | 5171/7378 [17:44:16<7:37:45, 12.44s/it] + 70%|███████ | 5172/7378 [17:44:28<7:38:12, 12.46s/it] + +{'loss': 0.416, 'learning_rate': 4.334369916312569e-06, 'epoch': 0.7} + + 70%|███████ | 5172/7378 [17:44:28<7:38:12, 12.46s/it] + 70%|███████ | 5173/7378 [17:44:41<7:35:58, 12.41s/it] + +{'loss': 0.438, 'learning_rate': 4.330752896427509e-06, 'epoch': 0.7} + + 70%|███████ | 5173/7378 [17:44:41<7:35:58, 12.41s/it] + 70%|███████ | 5174/7378 [17:44:55<7:52:02, 12.85s/it] + +{'loss': 0.4544, 'learning_rate': 4.327136969200987e-06, 'epoch': 0.7} + + 70%|███████ | 5174/7378 [17:44:55<7:52:02, 12.85s/it] + 70%|███████ | 5175/7378 [17:45:07<7:47:54, 12.74s/it] + +{'loss': 0.4341, 'learning_rate': 4.323522135329907e-06, 'epoch': 0.7} + + 70%|███████ | 5175/7378 [17:45:07<7:47:54, 12.74s/it] + 70%|███████ | 5176/7378 [17:45:20<7:44:31, 12.66s/it] + +{'loss': 0.4158, 'learning_rate': 4.3199083955109785e-06, 'epoch': 0.7} + + 70%|███████ | 5176/7378 [17:45:20<7:44:31, 12.66s/it] + 70%|███████ | 5177/7378 [17:45:32<7:41:44, 12.59s/it] + +{'loss': 0.4328, 'learning_rate': 4.3162957504406915e-06, 'epoch': 0.7} + + 70%|███████ | 5177/7378 [17:45:32<7:41:44, 12.59s/it] + 70%|███████ | 5178/7378 [17:45:44<7:39:55, 12.54s/it] + +{'loss': 0.421, 'learning_rate': 4.312684200815324e-06, 'epoch': 0.7} + + 70%|███████ | 5178/7378 [17:45:44<7:39:55, 12.54s/it] + 70%|███████ | 5179/7378 [17:45:57<7:40:21, 12.56s/it] + +{'loss': 0.4332, 'learning_rate': 4.309073747330943e-06, 'epoch': 0.7} + + 70%|███████ | 5179/7378 [17:45:57<7:40:21, 12.56s/it] + 70%|███████ | 5180/7378 [17:46:09<7:38:32, 12.52s/it] + +{'loss': 0.4254, 'learning_rate': 4.3054643906834145e-06, 'epoch': 0.7} + + 70%|███████ | 5180/7378 [17:46:09<7:38:32, 12.52s/it] + 70%|███████ | 5181/7378 [17:46:22<7:36:25, 12.47s/it] + +{'loss': 0.4471, 'learning_rate': 4.3018561315683825e-06, 'epoch': 0.7} + + 70%|███████ | 5181/7378 [17:46:22<7:36:25, 12.47s/it] + 70%|███████ | 5182/7378 [17:46:34<7:35:17, 12.44s/it] + +{'loss': 0.4421, 'learning_rate': 4.2982489706812815e-06, 'epoch': 0.7} + + 70%|███████ | 5182/7378 [17:46:34<7:35:17, 12.44s/it] + 70%|███████ | 5183/7378 [17:46:46<7:31:19, 12.34s/it] + +{'loss': 0.4604, 'learning_rate': 4.294642908717332e-06, 'epoch': 0.7} + + 70%|███████ | 5183/7378 [17:46:46<7:31:19, 12.34s/it] + 70%|███████ | 5184/7378 [17:46:58<7:29:32, 12.29s/it] + +{'loss': 0.4597, 'learning_rate': 4.291037946371551e-06, 'epoch': 0.7} + + 70%|███████ | 5184/7378 [17:46:58<7:29:32, 12.29s/it] + 70%|███████ | 5185/7378 [17:47:11<7:31:52, 12.36s/it] + +{'loss': 0.4557, 'learning_rate': 4.287434084338739e-06, 'epoch': 0.7} + + 70%|███████ | 5185/7378 [17:47:11<7:31:52, 12.36s/it] + 70%|███████ | 5186/7378 [17:47:23<7:29:45, 12.31s/it] + +{'loss': 0.3791, 'learning_rate': 4.28383132331348e-06, 'epoch': 0.7} + + 70%|███████ | 5186/7378 [17:47:23<7:29:45, 12.31s/it] + 70%|███████ | 5187/7378 [17:47:35<7:28:05, 12.27s/it] + +{'loss': 0.4437, 'learning_rate': 4.280229663990152e-06, 'epoch': 0.7} + + 70%|███████ | 5187/7378 [17:47:35<7:28:05, 12.27s/it] + 70%|███████ | 5188/7378 [17:47:47<7:25:06, 12.19s/it] + +{'loss': 0.4255, 'learning_rate': 4.276629107062914e-06, 'epoch': 0.7} + + 70%|███████ | 5188/7378 [17:47:47<7:25:06, 12.19s/it] + 70%|███████ | 5189/7378 [17:48:00<7:31:10, 12.37s/it] + +{'loss': 0.4826, 'learning_rate': 4.2730296532257244e-06, 'epoch': 0.7} + + 70%|███████ | 5189/7378 [17:48:00<7:31:10, 12.37s/it] + 70%|███████ | 5190/7378 [17:48:12<7:30:35, 12.36s/it] + +{'loss': 0.4715, 'learning_rate': 4.269431303172318e-06, 'epoch': 0.7} + + 70%|███████ | 5190/7378 [17:48:12<7:30:35, 12.36s/it] + 70%|███████ | 5191/7378 [17:48:24<7:26:13, 12.24s/it] + +{'loss': 0.4062, 'learning_rate': 4.26583405759622e-06, 'epoch': 0.7} + + 70%|███████ | 5191/7378 [17:48:24<7:26:13, 12.24s/it] + 70%|███████ | 5192/7378 [17:48:37<7:30:22, 12.36s/it] + +{'loss': 0.4323, 'learning_rate': 4.262237917190739e-06, 'epoch': 0.7} + + 70%|███████ | 5192/7378 [17:48:37<7:30:22, 12.36s/it] + 70%|███████ | 5193/7378 [17:48:49<7:24:19, 12.20s/it] + +{'loss': 0.4718, 'learning_rate': 4.258642882648984e-06, 'epoch': 0.7} + + 70%|███████ | 5193/7378 [17:48:49<7:24:19, 12.20s/it] + 70%|███████ | 5194/7378 [17:49:01<7:26:00, 12.25s/it] + +{'loss': 0.4235, 'learning_rate': 4.255048954663835e-06, 'epoch': 0.7} + + 70%|███████ | 5194/7378 [17:49:01<7:26:00, 12.25s/it] + 70%|███████ | 5195/7378 [17:49:13<7:24:24, 12.21s/it] + +{'loss': 0.4353, 'learning_rate': 4.251456133927968e-06, 'epoch': 0.7} + + 70%|███████ | 5195/7378 [17:49:13<7:24:24, 12.21s/it] + 70%|███████ | 5196/7378 [17:49:26<7:24:04, 12.21s/it] + +{'loss': 0.4028, 'learning_rate': 4.247864421133841e-06, 'epoch': 0.7} + + 70%|███████ | 5196/7378 [17:49:26<7:24:04, 12.21s/it] + 70%|███████ | 5197/7378 [17:49:38<7:25:03, 12.24s/it] + +{'loss': 0.4179, 'learning_rate': 4.244273816973698e-06, 'epoch': 0.7} + + 70%|███████ | 5197/7378 [17:49:38<7:25:03, 12.24s/it] + 70%|███████ | 5198/7378 [17:49:50<7:22:32, 12.18s/it] + +{'loss': 0.4273, 'learning_rate': 4.240684322139579e-06, 'epoch': 0.7} + + 70%|███████ | 5198/7378 [17:49:50<7:22:32, 12.18s/it] + 70%|███████ | 5199/7378 [17:50:02<7:23:29, 12.21s/it] + +{'loss': 0.4085, 'learning_rate': 4.237095937323298e-06, 'epoch': 0.7} + + 70%|███████ | 5199/7378 [17:50:02<7:23:29, 12.21s/it] + 70%|███████ | 5200/7378 [17:50:15<7:25:13, 12.27s/it] + +{'loss': 0.3799, 'learning_rate': 4.23350866321646e-06, 'epoch': 0.7} + + 70%|███████ | 5200/7378 [17:50:15<7:25:13, 12.27s/it] + 70%|███████ | 5201/7378 [17:50:27<7:26:09, 12.30s/it] + +{'loss': 0.4493, 'learning_rate': 4.229922500510454e-06, 'epoch': 0.7} + + 70%|███████ | 5201/7378 [17:50:27<7:26:09, 12.30s/it] + 71%|███████ | 5202/7378 [17:50:40<7:32:04, 12.47s/it] + +{'loss': 0.5173, 'learning_rate': 4.226337449896462e-06, 'epoch': 0.71} + + 71%|███████ | 5202/7378 [17:50:40<7:32:04, 12.47s/it] + 71%|███████ | 5203/7378 [17:50:52<7:27:43, 12.35s/it] + +{'loss': 0.3587, 'learning_rate': 4.222753512065444e-06, 'epoch': 0.71} + + 71%|███████ | 5203/7378 [17:50:52<7:27:43, 12.35s/it] + 71%|███████ | 5204/7378 [17:51:04<7:28:41, 12.38s/it] + +{'loss': 0.5006, 'learning_rate': 4.219170687708147e-06, 'epoch': 0.71} + + 71%|███████ | 5204/7378 [17:51:04<7:28:41, 12.38s/it] + 71%|███████ | 5205/7378 [17:51:16<7:24:58, 12.29s/it] + +{'loss': 0.4823, 'learning_rate': 4.2155889775151045e-06, 'epoch': 0.71} + + 71%|███████ | 5205/7378 [17:51:16<7:24:58, 12.29s/it] + 71%|███████ | 5206/7378 [17:51:29<7:27:03, 12.35s/it] + +{'loss': 0.4777, 'learning_rate': 4.212008382176631e-06, 'epoch': 0.71} + + 71%|███████ | 5206/7378 [17:51:29<7:27:03, 12.35s/it] + 71%|███████ | 5207/7378 [17:51:42<7:29:27, 12.42s/it] + +{'loss': 0.4254, 'learning_rate': 4.208428902382839e-06, 'epoch': 0.71} + + 71%|███████ | 5207/7378 [17:51:42<7:29:27, 12.42s/it] + 71%|███████ | 5208/7378 [17:51:54<7:31:01, 12.47s/it] + +{'loss': 0.4774, 'learning_rate': 4.204850538823612e-06, 'epoch': 0.71} + + 71%|███████ | 5208/7378 [17:51:54<7:31:01, 12.47s/it] + 71%|███████ | 5209/7378 [17:52:07<7:33:51, 12.55s/it] + +{'loss': 0.4673, 'learning_rate': 4.201273292188622e-06, 'epoch': 0.71} + + 71%|███████ | 5209/7378 [17:52:07<7:33:51, 12.55s/it] + 71%|███████ | 5210/7378 [17:52:19<7:30:41, 12.47s/it] + +{'loss': 0.4629, 'learning_rate': 4.197697163167328e-06, 'epoch': 0.71} + + 71%|███████ | 5210/7378 [17:52:19<7:30:41, 12.47s/it] + 71%|███████ | 5211/7378 [17:52:32<7:29:48, 12.45s/it] + +{'loss': 0.3988, 'learning_rate': 4.194122152448976e-06, 'epoch': 0.71} + + 71%|███████ | 5211/7378 [17:52:32<7:29:48, 12.45s/it] + 71%|███████ | 5212/7378 [17:52:44<7:29:50, 12.46s/it] + +{'loss': 0.5038, 'learning_rate': 4.190548260722591e-06, 'epoch': 0.71} + + 71%|███████ | 5212/7378 [17:52:44<7:29:50, 12.46s/it] + 71%|███████ | 5213/7378 [17:52:56<7:28:50, 12.44s/it] + +{'loss': 0.4725, 'learning_rate': 4.186975488676987e-06, 'epoch': 0.71} + + 71%|███████ | 5213/7378 [17:52:56<7:28:50, 12.44s/it] + 71%|███████ | 5214/7378 [17:53:08<7:24:04, 12.31s/it] + +{'loss': 0.4567, 'learning_rate': 4.183403837000755e-06, 'epoch': 0.71} + + 71%|███████ | 5214/7378 [17:53:08<7:24:04, 12.31s/it] + 71%|███████ | 5215/7378 [17:53:21<7:27:01, 12.40s/it] + +{'loss': 0.4736, 'learning_rate': 4.179833306382275e-06, 'epoch': 0.71} + + 71%|███████ | 5215/7378 [17:53:21<7:27:01, 12.40s/it] + 71%|███████ | 5216/7378 [17:53:34<7:32:54, 12.57s/it] + +{'loss': 0.4713, 'learning_rate': 4.176263897509717e-06, 'epoch': 0.71} + + 71%|███████ | 5216/7378 [17:53:34<7:32:54, 12.57s/it] + 71%|███████ | 5217/7378 [17:53:47<7:33:53, 12.60s/it] + +{'loss': 0.4276, 'learning_rate': 4.172695611071025e-06, 'epoch': 0.71} + + 71%|███████ | 5217/7378 [17:53:47<7:33:53, 12.60s/it] + 71%|███████ | 5218/7378 [17:53:59<7:35:10, 12.64s/it] + +{'loss': 0.4311, 'learning_rate': 4.16912844775393e-06, 'epoch': 0.71} + + 71%|███████ | 5218/7378 [17:53:59<7:35:10, 12.64s/it] + 71%|███████ | 5219/7378 [17:54:12<7:30:11, 12.51s/it] + +{'loss': 0.4062, 'learning_rate': 4.165562408245942e-06, 'epoch': 0.71} + + 71%|███████ | 5219/7378 [17:54:12<7:30:11, 12.51s/it] + 71%|███████ | 5220/7378 [17:54:24<7:30:00, 12.51s/it] + +{'loss': 0.4599, 'learning_rate': 4.16199749323437e-06, 'epoch': 0.71} + + 71%|███████ | 5220/7378 [17:54:24<7:30:00, 12.51s/it] + 71%|███████ | 5221/7378 [17:54:36<7:26:13, 12.41s/it] + +{'loss': 0.4439, 'learning_rate': 4.158433703406285e-06, 'epoch': 0.71} + + 71%|███████ | 5221/7378 [17:54:36<7:26:13, 12.41s/it] + 71%|███████ | 5222/7378 [17:54:48<7:22:30, 12.31s/it] + +{'loss': 0.5205, 'learning_rate': 4.154871039448561e-06, 'epoch': 0.71} + + 71%|███████ | 5222/7378 [17:54:48<7:22:30, 12.31s/it] + 71%|███████ | 5223/7378 [17:55:01<7:22:07, 12.31s/it] + +{'loss': 0.4457, 'learning_rate': 4.15130950204784e-06, 'epoch': 0.71} + + 71%|███████ | 5223/7378 [17:55:01<7:22:07, 12.31s/it] + 71%|███████ | 5224/7378 [17:55:13<7:22:13, 12.32s/it] + +{'loss': 0.4708, 'learning_rate': 4.147749091890555e-06, 'epoch': 0.71} + + 71%|███████ | 5224/7378 [17:55:13<7:22:13, 12.32s/it] + 71%|███████ | 5225/7378 [17:55:25<7:23:02, 12.35s/it] + +{'loss': 0.3538, 'learning_rate': 4.144189809662913e-06, 'epoch': 0.71} + + 71%|███████ | 5225/7378 [17:55:25<7:23:02, 12.35s/it] + 71%|███████ | 5226/7378 [17:55:38<7:21:07, 12.30s/it] + +{'loss': 0.42, 'learning_rate': 4.140631656050919e-06, 'epoch': 0.71} + + 71%|███████ | 5226/7378 [17:55:38<7:21:07, 12.30s/it] + 71%|███████ | 5227/7378 [17:55:50<7:17:39, 12.21s/it] + +{'loss': 0.4078, 'learning_rate': 4.137074631740346e-06, 'epoch': 0.71} + + 71%|███████ | 5227/7378 [17:55:50<7:17:39, 12.21s/it] + 71%|███████ | 5228/7378 [17:56:02<7:16:09, 12.17s/it] + +{'loss': 0.5327, 'learning_rate': 4.133518737416757e-06, 'epoch': 0.71} + + 71%|███████ | 5228/7378 [17:56:02<7:16:09, 12.17s/it] + 71%|███████ | 5229/7378 [17:56:13<7:11:34, 12.05s/it] + +{'loss': 0.4198, 'learning_rate': 4.129963973765493e-06, 'epoch': 0.71} + + 71%|███████ | 5229/7378 [17:56:13<7:11:34, 12.05s/it] + 71%|███████ | 5230/7378 [17:56:26<7:13:44, 12.12s/it] + +{'loss': 0.3928, 'learning_rate': 4.126410341471676e-06, 'epoch': 0.71} + + 71%|███████ | 5230/7378 [17:56:26<7:13:44, 12.12s/it] + 71%|███████ | 5231/7378 [17:56:38<7:14:30, 12.14s/it] + +{'loss': 0.5029, 'learning_rate': 4.12285784122022e-06, 'epoch': 0.71} + + 71%|███████ | 5231/7378 [17:56:38<7:14:30, 12.14s/it] + 71%|███████ | 5232/7378 [17:56:50<7:15:40, 12.18s/it] + +{'loss': 0.4403, 'learning_rate': 4.119306473695811e-06, 'epoch': 0.71} + + 71%|███████ | 5232/7378 [17:56:50<7:15:40, 12.18s/it] + 71%|███████ | 5233/7378 [17:57:02<7:11:43, 12.08s/it] + +{'loss': 0.3569, 'learning_rate': 4.1157562395829186e-06, 'epoch': 0.71} + + 71%|███████ | 5233/7378 [17:57:02<7:11:43, 12.08s/it] + 71%|███████ | 5234/7378 [17:57:14<7:15:23, 12.18s/it] + +{'loss': 0.4284, 'learning_rate': 4.112207139565792e-06, 'epoch': 0.71} + + 71%|███████ | 5234/7378 [17:57:14<7:15:23, 12.18s/it] + 71%|███████ | 5235/7378 [17:57:26<7:13:18, 12.13s/it] + +{'loss': 0.4058, 'learning_rate': 4.10865917432847e-06, 'epoch': 0.71} + + 71%|███████ | 5235/7378 [17:57:27<7:13:18, 12.13s/it] + 71%|███████ | 5236/7378 [17:57:39<7:14:33, 12.17s/it] + +{'loss': 0.4747, 'learning_rate': 4.105112344554765e-06, 'epoch': 0.71} + + 71%|███████ | 5236/7378 [17:57:39<7:14:33, 12.17s/it] + 71%|███████ | 5237/7378 [17:57:51<7:17:16, 12.25s/it] + +{'loss': 0.4529, 'learning_rate': 4.101566650928273e-06, 'epoch': 0.71} + + 71%|███████ | 5237/7378 [17:57:51<7:17:16, 12.25s/it] + 71%|███████ | 5238/7378 [17:58:03<7:14:47, 12.19s/it] + +{'loss': 0.4601, 'learning_rate': 4.098022094132371e-06, 'epoch': 0.71} + + 71%|███████ | 5238/7378 [17:58:03<7:14:47, 12.19s/it] + 71%|███████ | 5239/7378 [17:58:16<7:18:47, 12.31s/it] + +{'loss': 0.3769, 'learning_rate': 4.094478674850212e-06, 'epoch': 0.71} + + 71%|███████ | 5239/7378 [17:58:16<7:18:47, 12.31s/it] + 71%|███████ | 5240/7378 [17:58:28<7:18:58, 12.32s/it] + +{'loss': 0.4109, 'learning_rate': 4.090936393764743e-06, 'epoch': 0.71} + + 71%|███████ | 5240/7378 [17:58:28<7:18:58, 12.32s/it] + 71%|███████ | 5241/7378 [17:58:41<7:20:43, 12.37s/it] + +{'loss': 0.4431, 'learning_rate': 4.087395251558679e-06, 'epoch': 0.71} + + 71%|███████ | 5241/7378 [17:58:41<7:20:43, 12.37s/it] + 71%|███████ | 5242/7378 [17:58:53<7:17:51, 12.30s/it] + +{'loss': 0.5355, 'learning_rate': 4.083855248914519e-06, 'epoch': 0.71} + + 71%|███████ | 5242/7378 [17:58:53<7:17:51, 12.30s/it] + 71%|███████ | 5243/7378 [17:59:05<7:14:31, 12.21s/it] + +{'loss': 0.4194, 'learning_rate': 4.080316386514541e-06, 'epoch': 0.71} + + 71%|███████ | 5243/7378 [17:59:05<7:14:31, 12.21s/it] + 71%|███████ | 5244/7378 [17:59:17<7:15:19, 12.24s/it] + +{'loss': 0.4557, 'learning_rate': 4.076778665040811e-06, 'epoch': 0.71} + + 71%|███████ | 5244/7378 [17:59:17<7:15:19, 12.24s/it] + 71%|███████ | 5245/7378 [17:59:29<7:15:11, 12.24s/it] + +{'loss': 0.4247, 'learning_rate': 4.073242085175167e-06, 'epoch': 0.71} + + 71%|███████ | 5245/7378 [17:59:29<7:15:11, 12.24s/it] + 71%|███████ | 5246/7378 [17:59:42<7:16:46, 12.29s/it] + +{'loss': 0.4498, 'learning_rate': 4.069706647599229e-06, 'epoch': 0.71} + + 71%|███████ | 5246/7378 [17:59:42<7:16:46, 12.29s/it] + 71%|███████ | 5247/7378 [17:59:54<7:12:54, 12.19s/it] + +{'loss': 0.4247, 'learning_rate': 4.066172352994395e-06, 'epoch': 0.71} + + 71%|███████ | 5247/7378 [17:59:54<7:12:54, 12.19s/it] + 71%|███████ | 5248/7378 [18:00:06<7:14:01, 12.23s/it] + +{'loss': 0.4174, 'learning_rate': 4.062639202041845e-06, 'epoch': 0.71} + + 71%|███████ | 5248/7378 [18:00:06<7:14:01, 12.23s/it] + 71%|███████ | 5249/7378 [18:00:18<7:12:44, 12.20s/it] + +{'loss': 0.424, 'learning_rate': 4.059107195422544e-06, 'epoch': 0.71} + + 71%|███████ | 5249/7378 [18:00:18<7:12:44, 12.20s/it] + 71%|███████ | 5250/7378 [18:00:31<7:14:13, 12.24s/it] + +{'loss': 0.4033, 'learning_rate': 4.055576333817226e-06, 'epoch': 0.71} + + 71%|███████ | 5250/7378 [18:00:31<7:14:13, 12.24s/it] + 71%|███████ | 5251/7378 [18:00:42<7:10:31, 12.14s/it] + +{'loss': 0.4299, 'learning_rate': 4.052046617906412e-06, 'epoch': 0.71} + + 71%|███████ | 5251/7378 [18:00:42<7:10:31, 12.14s/it] + 71%|███████ | 5252/7378 [18:00:55<7:10:13, 12.14s/it] + +{'loss': 0.4337, 'learning_rate': 4.048518048370394e-06, 'epoch': 0.71} + + 71%|███████ | 5252/7378 [18:00:55<7:10:13, 12.14s/it] + 71%|███████ | 5253/7378 [18:01:07<7:12:29, 12.21s/it] + +{'loss': 0.4622, 'learning_rate': 4.044990625889255e-06, 'epoch': 0.71} + + 71%|███████ | 5253/7378 [18:01:07<7:12:29, 12.21s/it] + 71%|███████ | 5254/7378 [18:01:19<7:13:15, 12.24s/it] + +{'loss': 0.4421, 'learning_rate': 4.041464351142847e-06, 'epoch': 0.71} + + 71%|███████ | 5254/7378 [18:01:19<7:13:15, 12.24s/it] + 71%|███████ | 5255/7378 [18:01:32<7:15:51, 12.32s/it] + +{'loss': 0.4472, 'learning_rate': 4.037939224810807e-06, 'epoch': 0.71} + + 71%|███████ | 5255/7378 [18:01:32<7:15:51, 12.32s/it] + 71%|███████ | 5256/7378 [18:01:44<7:12:35, 12.23s/it] + +{'loss': 0.4453, 'learning_rate': 4.034415247572545e-06, 'epoch': 0.71} + + 71%|███████ | 5256/7378 [18:01:44<7:12:35, 12.23s/it] + 71%|███████▏ | 5257/7378 [18:01:56<7:13:05, 12.25s/it] + +{'loss': 0.4304, 'learning_rate': 4.0308924201072495e-06, 'epoch': 0.71} + + 71%|███████▏ | 5257/7378 [18:01:56<7:13:05, 12.25s/it] + 71%|███████▏ | 5258/7378 [18:02:08<7:12:44, 12.25s/it] + +{'loss': 0.4298, 'learning_rate': 4.027370743093898e-06, 'epoch': 0.71} + + 71%|███████▏ | 5258/7378 [18:02:08<7:12:44, 12.25s/it] + 71%|███████▏ | 5259/7378 [18:02:20<7:09:43, 12.17s/it] + +{'loss': 0.5036, 'learning_rate': 4.023850217211234e-06, 'epoch': 0.71} + + 71%|███████▏ | 5259/7378 [18:02:20<7:09:43, 12.17s/it] + 71%|███████▏ | 5260/7378 [18:02:32<7:06:09, 12.07s/it] + +{'loss': 0.4169, 'learning_rate': 4.020330843137784e-06, 'epoch': 0.71} + + 71%|███████▏ | 5260/7378 [18:02:32<7:06:09, 12.07s/it] + 71%|███████▏ | 5261/7378 [18:02:45<7:09:10, 12.16s/it] + +{'loss': 0.3696, 'learning_rate': 4.01681262155185e-06, 'epoch': 0.71} + + 71%|███████▏ | 5261/7378 [18:02:45<7:09:10, 12.16s/it] + 71%|███████▏ | 5262/7378 [18:02:57<7:08:06, 12.14s/it] + +{'loss': 0.4298, 'learning_rate': 4.013295553131515e-06, 'epoch': 0.71} + + 71%|███████▏ | 5262/7378 [18:02:57<7:08:06, 12.14s/it] + 71%|███████▏ | 5263/7378 [18:03:09<7:14:36, 12.33s/it] + +{'loss': 0.4576, 'learning_rate': 4.009779638554645e-06, 'epoch': 0.71} + + 71%|███████▏ | 5263/7378 [18:03:09<7:14:36, 12.33s/it] + 71%|███████▏ | 5264/7378 [18:03:21<7:11:58, 12.26s/it] + +{'loss': 0.4726, 'learning_rate': 4.0062648784988735e-06, 'epoch': 0.71} + + 71%|███████▏ | 5264/7378 [18:03:21<7:11:58, 12.26s/it] + 71%|███████▏ | 5265/7378 [18:03:34<7:11:13, 12.25s/it] + +{'loss': 0.4686, 'learning_rate': 4.002751273641613e-06, 'epoch': 0.71} + + 71%|███████▏ | 5265/7378 [18:03:34<7:11:13, 12.25s/it] + 71%|███████▏ | 5266/7378 [18:03:46<7:10:58, 12.24s/it] + +{'loss': 0.4084, 'learning_rate': 3.999238824660058e-06, 'epoch': 0.71} + + 71%|███████▏ | 5266/7378 [18:03:46<7:10:58, 12.24s/it] + 71%|███████▏ | 5267/7378 [18:03:58<7:12:20, 12.29s/it] + +{'loss': 0.3984, 'learning_rate': 3.995727532231174e-06, 'epoch': 0.71} + + 71%|███████▏ | 5267/7378 [18:03:58<7:12:20, 12.29s/it] + 71%|███████▏ | 5268/7378 [18:04:10<7:06:32, 12.13s/it] + +{'loss': 0.4602, 'learning_rate': 3.992217397031715e-06, 'epoch': 0.71} + + 71%|███████▏ | 5268/7378 [18:04:10<7:06:32, 12.13s/it] + 71%|███████▏ | 5269/7378 [18:04:22<7:09:17, 12.21s/it] + +{'loss': 0.3715, 'learning_rate': 3.9887084197382e-06, 'epoch': 0.71} + + 71%|███████▏ | 5269/7378 [18:04:22<7:09:17, 12.21s/it] + 71%|███████▏ | 5270/7378 [18:04:35<7:09:45, 12.23s/it] + +{'loss': 0.4377, 'learning_rate': 3.985200601026931e-06, 'epoch': 0.71} + + 71%|███████▏ | 5270/7378 [18:04:35<7:09:45, 12.23s/it] + 71%|███████▏ | 5271/7378 [18:04:47<7:07:13, 12.17s/it] + +{'loss': 0.4118, 'learning_rate': 3.981693941573979e-06, 'epoch': 0.71} + + 71%|███████▏ | 5271/7378 [18:04:47<7:07:13, 12.17s/it] + 71%|███████▏ | 5272/7378 [18:04:59<7:12:17, 12.32s/it] + +{'loss': 0.428, 'learning_rate': 3.978188442055207e-06, 'epoch': 0.71} + + 71%|███████▏ | 5272/7378 [18:04:59<7:12:17, 12.32s/it] + 71%|███████▏ | 5273/7378 [18:05:12<7:09:26, 12.24s/it] + +{'loss': 0.4536, 'learning_rate': 3.97468410314624e-06, 'epoch': 0.71} + + 71%|███████▏ | 5273/7378 [18:05:12<7:09:26, 12.24s/it] + 71%|███████▏ | 5274/7378 [18:05:23<7:06:10, 12.15s/it] + +{'loss': 0.4521, 'learning_rate': 3.971180925522487e-06, 'epoch': 0.71} + + 71%|███████▏ | 5274/7378 [18:05:23<7:06:10, 12.15s/it] + 71%|███████▏ | 5275/7378 [18:05:36<7:15:13, 12.42s/it] + +{'loss': 0.4784, 'learning_rate': 3.9676789098591275e-06, 'epoch': 0.71} + + 71%|███████▏ | 5275/7378 [18:05:36<7:15:13, 12.42s/it] + 72%|███████▏ | 5276/7378 [18:05:49<7:16:25, 12.46s/it] + +{'loss': 0.4183, 'learning_rate': 3.964178056831117e-06, 'epoch': 0.72} + + 72%|███████▏ | 5276/7378 [18:05:49<7:16:25, 12.46s/it] + 72%|███████▏ | 5277/7378 [18:06:02<7:17:02, 12.48s/it] + +{'loss': 0.412, 'learning_rate': 3.9606783671132e-06, 'epoch': 0.72} + + 72%|███████▏ | 5277/7378 [18:06:02<7:17:02, 12.48s/it] + 72%|███████▏ | 5278/7378 [18:06:14<7:12:22, 12.35s/it] + +{'loss': 0.3917, 'learning_rate': 3.95717984137988e-06, 'epoch': 0.72} + + 72%|███████▏ | 5278/7378 [18:06:14<7:12:22, 12.35s/it] + 72%|███████▏ | 5279/7378 [18:06:26<7:12:07, 12.35s/it] + +{'loss': 0.3815, 'learning_rate': 3.953682480305445e-06, 'epoch': 0.72} + + 72%|███████▏ | 5279/7378 [18:06:26<7:12:07, 12.35s/it] + 72%|███████▏ | 5280/7378 [18:06:38<7:09:43, 12.29s/it] + +{'loss': 0.4786, 'learning_rate': 3.950186284563956e-06, 'epoch': 0.72} + + 72%|███████▏ | 5280/7378 [18:06:38<7:09:43, 12.29s/it] + 72%|███████▏ | 5281/7378 [18:06:51<7:10:45, 12.32s/it] + +{'loss': 0.4314, 'learning_rate': 3.946691254829246e-06, 'epoch': 0.72} + + 72%|███████▏ | 5281/7378 [18:06:51<7:10:45, 12.32s/it] + 72%|███████▏ | 5282/7378 [18:07:03<7:07:59, 12.25s/it] + +{'loss': 0.4177, 'learning_rate': 3.9431973917749345e-06, 'epoch': 0.72} + + 72%|███████▏ | 5282/7378 [18:07:03<7:07:59, 12.25s/it] + 72%|███████▏ | 5283/7378 [18:07:15<7:13:37, 12.42s/it] + +{'loss': 0.4577, 'learning_rate': 3.939704696074405e-06, 'epoch': 0.72} + + 72%|███████▏ | 5283/7378 [18:07:15<7:13:37, 12.42s/it] + 72%|███████▏ | 5284/7378 [18:07:28<7:14:59, 12.46s/it] + +{'loss': 0.4068, 'learning_rate': 3.936213168400821e-06, 'epoch': 0.72} + + 72%|███████▏ | 5284/7378 [18:07:28<7:14:59, 12.46s/it] + 72%|███████▏ | 5285/7378 [18:07:40<7:09:50, 12.32s/it] + +{'loss': 0.4097, 'learning_rate': 3.932722809427114e-06, 'epoch': 0.72} + + 72%|███████▏ | 5285/7378 [18:07:40<7:09:50, 12.32s/it] + 72%|███████▏ | 5286/7378 [18:07:52<7:09:54, 12.33s/it] + +{'loss': 0.4343, 'learning_rate': 3.929233619826006e-06, 'epoch': 0.72} + + 72%|███████▏ | 5286/7378 [18:07:52<7:09:54, 12.33s/it] + 72%|███████▏ | 5287/7378 [18:08:05<7:08:32, 12.30s/it] + +{'loss': 0.4955, 'learning_rate': 3.925745600269978e-06, 'epoch': 0.72} + + 72%|███████▏ | 5287/7378 [18:08:05<7:08:32, 12.30s/it] + 72%|███████▏ | 5288/7378 [18:08:17<7:09:13, 12.32s/it] + +{'loss': 0.4191, 'learning_rate': 3.922258751431293e-06, 'epoch': 0.72} + + 72%|███████▏ | 5288/7378 [18:08:17<7:09:13, 12.32s/it] + 72%|███████▏ | 5289/7378 [18:08:29<7:05:58, 12.23s/it] + +{'loss': 0.4568, 'learning_rate': 3.918773073981983e-06, 'epoch': 0.72} + + 72%|███████▏ | 5289/7378 [18:08:29<7:05:58, 12.23s/it] + 72%|███████▏ | 5290/7378 [18:08:42<7:09:30, 12.34s/it] + +{'loss': 0.4074, 'learning_rate': 3.915288568593857e-06, 'epoch': 0.72} + + 72%|███████▏ | 5290/7378 [18:08:42<7:09:30, 12.34s/it] + 72%|███████▏ | 5291/7378 [18:08:55<7:17:18, 12.57s/it] + +{'loss': 0.4818, 'learning_rate': 3.911805235938506e-06, 'epoch': 0.72} + + 72%|███████▏ | 5291/7378 [18:08:55<7:17:18, 12.57s/it] + 72%|███████▏ | 5292/7378 [18:09:07<7:16:29, 12.55s/it] + +{'loss': 0.457, 'learning_rate': 3.908323076687282e-06, 'epoch': 0.72} + + 72%|███████▏ | 5292/7378 [18:09:07<7:16:29, 12.55s/it] + 72%|███████▏ | 5293/7378 [18:09:19<7:11:54, 12.43s/it] + +{'loss': 0.4023, 'learning_rate': 3.90484209151132e-06, 'epoch': 0.72} + + 72%|███████▏ | 5293/7378 [18:09:19<7:11:54, 12.43s/it] + 72%|███████▏ | 5294/7378 [18:09:31<7:05:43, 12.26s/it] + +{'loss': 0.4411, 'learning_rate': 3.901362281081519e-06, 'epoch': 0.72} + + 72%|███████▏ | 5294/7378 [18:09:31<7:05:43, 12.26s/it] + 72%|███████▏ | 5295/7378 [18:09:43<7:03:00, 12.18s/it] + +{'loss': 0.3352, 'learning_rate': 3.897883646068565e-06, 'epoch': 0.72} + + 72%|███████▏ | 5295/7378 [18:09:43<7:03:00, 12.18s/it] + 72%|███████▏ | 5296/7378 [18:09:55<7:03:26, 12.20s/it] + +{'loss': 0.4469, 'learning_rate': 3.894406187142908e-06, 'epoch': 0.72} + + 72%|███████▏ | 5296/7378 [18:09:55<7:03:26, 12.20s/it] + 72%|███████▏ | 5297/7378 [18:10:08<7:02:26, 12.18s/it] + +{'loss': 0.4458, 'learning_rate': 3.890929904974775e-06, 'epoch': 0.72} + + 72%|███████▏ | 5297/7378 [18:10:08<7:02:26, 12.18s/it] + 72%|███████▏ | 5298/7378 [18:10:20<7:04:21, 12.24s/it] + +{'loss': 0.364, 'learning_rate': 3.887454800234161e-06, 'epoch': 0.72} + + 72%|███████▏ | 5298/7378 [18:10:20<7:04:21, 12.24s/it] + 72%|███████▏ | 5299/7378 [18:10:32<7:06:26, 12.31s/it] + +{'loss': 0.3742, 'learning_rate': 3.883980873590839e-06, 'epoch': 0.72} + + 72%|███████▏ | 5299/7378 [18:10:32<7:06:26, 12.31s/it] + 72%|███████▏ | 5300/7378 [18:10:45<7:06:57, 12.33s/it] + +{'loss': 0.544, 'learning_rate': 3.880508125714357e-06, 'epoch': 0.72} + + 72%|███████▏ | 5300/7378 [18:10:45<7:06:57, 12.33s/it] + 72%|███████▏ | 5301/7378 [18:10:57<7:06:47, 12.33s/it] + +{'loss': 0.5056, 'learning_rate': 3.877036557274032e-06, 'epoch': 0.72} + + 72%|███████▏ | 5301/7378 [18:10:57<7:06:47, 12.33s/it] + 72%|███████▏ | 5302/7378 [18:11:10<7:11:29, 12.47s/it] + +{'loss': 0.4612, 'learning_rate': 3.8735661689389535e-06, 'epoch': 0.72} + + 72%|███████▏ | 5302/7378 [18:11:10<7:11:29, 12.47s/it] + 72%|███████▏ | 5303/7378 [18:11:22<7:10:28, 12.45s/it] + +{'loss': 0.458, 'learning_rate': 3.870096961377981e-06, 'epoch': 0.72} + + 72%|███████▏ | 5303/7378 [18:11:22<7:10:28, 12.45s/it] + 72%|███████▏ | 5304/7378 [18:11:34<7:05:58, 12.32s/it] + +{'loss': 0.4133, 'learning_rate': 3.866628935259755e-06, 'epoch': 0.72} + + 72%|███████▏ | 5304/7378 [18:11:34<7:05:58, 12.32s/it] + 72%|███████▏ | 5305/7378 [18:11:47<7:06:57, 12.36s/it] + +{'loss': 0.4656, 'learning_rate': 3.863162091252682e-06, 'epoch': 0.72} + + 72%|███████▏ | 5305/7378 [18:11:47<7:06:57, 12.36s/it] + 72%|███████▏ | 5306/7378 [18:12:00<7:10:48, 12.48s/it] + +{'loss': 0.383, 'learning_rate': 3.859696430024939e-06, 'epoch': 0.72} + + 72%|███████▏ | 5306/7378 [18:12:00<7:10:48, 12.48s/it] + 72%|███████▏ | 5307/7378 [18:12:12<7:07:48, 12.39s/it] + +{'loss': 0.4928, 'learning_rate': 3.856231952244483e-06, 'epoch': 0.72} + + 72%|███████▏ | 5307/7378 [18:12:12<7:07:48, 12.39s/it] + 72%|███████▏ | 5308/7378 [18:12:24<7:06:05, 12.35s/it] + +{'loss': 0.4342, 'learning_rate': 3.8527686585790345e-06, 'epoch': 0.72} + + 72%|███████▏ | 5308/7378 [18:12:24<7:06:05, 12.35s/it] + 72%|███████▏ | 5309/7378 [18:12:36<7:04:16, 12.30s/it] + +{'loss': 0.4381, 'learning_rate': 3.849306549696087e-06, 'epoch': 0.72} + + 72%|███████▏ | 5309/7378 [18:12:36<7:04:16, 12.30s/it] + 72%|███████▏ | 5310/7378 [18:12:48<7:01:45, 12.24s/it] + +{'loss': 0.3966, 'learning_rate': 3.845845626262913e-06, 'epoch': 0.72} + + 72%|███████▏ | 5310/7378 [18:12:48<7:01:45, 12.24s/it] + 72%|███████▏ | 5311/7378 [18:13:01<7:06:47, 12.39s/it] + +{'loss': 0.484, 'learning_rate': 3.842385888946548e-06, 'epoch': 0.72} + + 72%|███████▏ | 5311/7378 [18:13:01<7:06:47, 12.39s/it] + 72%|███████▏ | 5312/7378 [18:13:13<7:05:20, 12.35s/it] + +{'loss': 0.4399, 'learning_rate': 3.838927338413804e-06, 'epoch': 0.72} + + 72%|███████▏ | 5312/7378 [18:13:13<7:05:20, 12.35s/it] + 72%|███████▏ | 5313/7378 [18:13:26<7:06:15, 12.39s/it] + +{'loss': 0.4428, 'learning_rate': 3.835469975331256e-06, 'epoch': 0.72} + + 72%|███████▏ | 5313/7378 [18:13:26<7:06:15, 12.39s/it] + 72%|███████▏ | 5314/7378 [18:13:38<7:03:53, 12.32s/it] + +{'loss': 0.4391, 'learning_rate': 3.832013800365266e-06, 'epoch': 0.72} + + 72%|███████▏ | 5314/7378 [18:13:38<7:03:53, 12.32s/it] + 72%|███████▏ | 5315/7378 [18:13:50<7:03:19, 12.31s/it] + +{'loss': 0.3878, 'learning_rate': 3.8285588141819545e-06, 'epoch': 0.72} + + 72%|███████▏ | 5315/7378 [18:13:50<7:03:19, 12.31s/it] + 72%|███████▏ | 5316/7378 [18:14:06<7:35:06, 13.24s/it] + +{'loss': 0.5202, 'learning_rate': 3.825105017447213e-06, 'epoch': 0.72} + + 72%|███████▏ | 5316/7378 [18:14:06<7:35:06, 13.24s/it] + 72%|███████▏ | 5317/7378 [18:14:18<7:22:46, 12.89s/it] + +{'loss': 0.4427, 'learning_rate': 3.8216524108267085e-06, 'epoch': 0.72} + + 72%|███████▏ | 5317/7378 [18:14:18<7:22:46, 12.89s/it] + 72%|███████▏ | 5318/7378 [18:14:30<7:15:42, 12.69s/it] + +{'loss': 0.3589, 'learning_rate': 3.818200994985872e-06, 'epoch': 0.72} + + 72%|███████▏ | 5318/7378 [18:14:30<7:15:42, 12.69s/it] + 72%|███████▏ | 5319/7378 [18:14:42<7:13:08, 12.62s/it] + +{'loss': 0.4314, 'learning_rate': 3.81475077058992e-06, 'epoch': 0.72} + + 72%|███████▏ | 5319/7378 [18:14:42<7:13:08, 12.62s/it] + 72%|███████▏ | 5320/7378 [18:14:54<7:06:52, 12.45s/it] + +{'loss': 0.458, 'learning_rate': 3.811301738303823e-06, 'epoch': 0.72} + + 72%|███████▏ | 5320/7378 [18:14:54<7:06:52, 12.45s/it] + 72%|███████▏ | 5321/7378 [18:15:07<7:10:12, 12.55s/it] + +{'loss': 0.45, 'learning_rate': 3.8078538987923284e-06, 'epoch': 0.72} + + 72%|███████▏ | 5321/7378 [18:15:07<7:10:12, 12.55s/it] + 72%|███████▏ | 5322/7378 [18:15:19<7:06:39, 12.45s/it] + +{'loss': 0.3781, 'learning_rate': 3.804407252719949e-06, 'epoch': 0.72} + + 72%|███████▏ | 5322/7378 [18:15:19<7:06:39, 12.45s/it] + 72%|███████▏ | 5323/7378 [18:15:32<7:05:20, 12.42s/it] + +{'loss': 0.4479, 'learning_rate': 3.8009618007509807e-06, 'epoch': 0.72} + + 72%|███████▏ | 5323/7378 [18:15:32<7:05:20, 12.42s/it] + 72%|███████▏ | 5324/7378 [18:15:44<7:04:14, 12.39s/it] + +{'loss': 0.4577, 'learning_rate': 3.797517543549476e-06, 'epoch': 0.72} + + 72%|███████▏ | 5324/7378 [18:15:44<7:04:14, 12.39s/it] + 72%|███████▏ | 5325/7378 [18:15:56<7:02:38, 12.35s/it] + +{'loss': 0.4235, 'learning_rate': 3.794074481779261e-06, 'epoch': 0.72} + + 72%|███████▏ | 5325/7378 [18:15:56<7:02:38, 12.35s/it] + 72%|███████▏ | 5326/7378 [18:16:09<7:01:44, 12.33s/it] + +{'loss': 0.5097, 'learning_rate': 3.790632616103932e-06, 'epoch': 0.72} + + 72%|███████▏ | 5326/7378 [18:16:09<7:01:44, 12.33s/it] + 72%|███████▏ | 5327/7378 [18:16:21<6:59:35, 12.27s/it] + +{'loss': 0.4643, 'learning_rate': 3.7871919471868525e-06, 'epoch': 0.72} + + 72%|███████▏ | 5327/7378 [18:16:21<6:59:35, 12.27s/it] + 72%|███████▏ | 5328/7378 [18:16:33<6:58:07, 12.24s/it] + +{'loss': 0.427, 'learning_rate': 3.7837524756911625e-06, 'epoch': 0.72} + + 72%|███████▏ | 5328/7378 [18:16:33<6:58:07, 12.24s/it] + 72%|███████▏ | 5329/7378 [18:16:45<7:00:07, 12.30s/it] + +{'loss': 0.4718, 'learning_rate': 3.7803142022797632e-06, 'epoch': 0.72} + + 72%|███████▏ | 5329/7378 [18:16:45<7:00:07, 12.30s/it] + 72%|███████▏ | 5330/7378 [18:16:57<6:57:38, 12.24s/it] + +{'loss': 0.4779, 'learning_rate': 3.776877127615329e-06, 'epoch': 0.72} + + 72%|███████▏ | 5330/7378 [18:16:58<6:57:38, 12.24s/it] + 72%|███████▏ | 5331/7378 [18:17:10<6:59:10, 12.29s/it] + +{'loss': 0.4465, 'learning_rate': 3.7734412523603027e-06, 'epoch': 0.72} + + 72%|███████▏ | 5331/7378 [18:17:10<6:59:10, 12.29s/it] + 72%|███████▏ | 5332/7378 [18:17:22<6:56:53, 12.23s/it] + +{'loss': 0.4613, 'learning_rate': 3.770006577176889e-06, 'epoch': 0.72} + + 72%|███████▏ | 5332/7378 [18:17:22<6:56:53, 12.23s/it] + 72%|███████▏ | 5333/7378 [18:17:34<6:58:38, 12.28s/it] + +{'loss': 0.415, 'learning_rate': 3.766573102727078e-06, 'epoch': 0.72} + + 72%|███████▏ | 5333/7378 [18:17:34<6:58:38, 12.28s/it] + 72%|███████▏ | 5334/7378 [18:17:47<6:57:48, 12.26s/it] + +{'loss': 0.4926, 'learning_rate': 3.7631408296726126e-06, 'epoch': 0.72} + + 72%|███████▏ | 5334/7378 [18:17:47<6:57:48, 12.26s/it] + 72%|███████▏ | 5335/7378 [18:17:59<6:56:58, 12.25s/it] + +{'loss': 0.5014, 'learning_rate': 3.7597097586750097e-06, 'epoch': 0.72} + + 72%|███████▏ | 5335/7378 [18:17:59<6:56:58, 12.25s/it] + 72%|███████▏ | 5336/7378 [18:18:11<7:00:28, 12.35s/it] + +{'loss': 0.4951, 'learning_rate': 3.756279890395551e-06, 'epoch': 0.72} + + 72%|███████▏ | 5336/7378 [18:18:11<7:00:28, 12.35s/it] + 72%|███████▏ | 5337/7378 [18:18:24<6:59:53, 12.34s/it] + +{'loss': 0.3544, 'learning_rate': 3.7528512254952975e-06, 'epoch': 0.72} + + 72%|███████▏ | 5337/7378 [18:18:24<6:59:53, 12.34s/it] + 72%|███████▏ | 5338/7378 [18:18:36<6:57:04, 12.27s/it] + +{'loss': 0.4785, 'learning_rate': 3.7494237646350675e-06, 'epoch': 0.72} + + 72%|███████▏ | 5338/7378 [18:18:36<6:57:04, 12.27s/it] + 72%|███████▏ | 5339/7378 [18:18:48<6:54:32, 12.20s/it] + +{'loss': 0.4365, 'learning_rate': 3.74599750847545e-06, 'epoch': 0.72} + + 72%|███████▏ | 5339/7378 [18:18:48<6:54:32, 12.20s/it] + 72%|███████▏ | 5340/7378 [18:19:00<6:54:42, 12.21s/it] + +{'loss': 0.4569, 'learning_rate': 3.742572457676801e-06, 'epoch': 0.72} + + 72%|███████▏ | 5340/7378 [18:19:00<6:54:42, 12.21s/it] + 72%|███████▏ | 5341/7378 [18:19:12<6:56:40, 12.27s/it] + +{'loss': 0.4379, 'learning_rate': 3.739148612899243e-06, 'epoch': 0.72} + + 72%|███████▏ | 5341/7378 [18:19:12<6:56:40, 12.27s/it] + 72%|███████▏ | 5342/7378 [18:19:25<7:02:34, 12.45s/it] + +{'loss': 0.4663, 'learning_rate': 3.735725974802675e-06, 'epoch': 0.72} + + 72%|███████▏ | 5342/7378 [18:19:25<7:02:34, 12.45s/it] + 72%|███████▏ | 5343/7378 [18:19:38<7:01:52, 12.44s/it] + +{'loss': 0.4348, 'learning_rate': 3.7323045440467543e-06, 'epoch': 0.72} + + 72%|███████▏ | 5343/7378 [18:19:38<7:01:52, 12.44s/it] + 72%|███████▏ | 5344/7378 [18:19:50<7:02:04, 12.45s/it] + +{'loss': 0.4152, 'learning_rate': 3.7288843212909065e-06, 'epoch': 0.72} + + 72%|███████▏ | 5344/7378 [18:19:50<7:02:04, 12.45s/it] + 72%|███████▏ | 5345/7378 [18:20:02<6:57:24, 12.32s/it] + +{'loss': 0.4022, 'learning_rate': 3.7254653071943235e-06, 'epoch': 0.72} + + 72%|███████▏ | 5345/7378 [18:20:02<6:57:24, 12.32s/it] + 72%|███████▏ | 5346/7378 [18:20:14<6:56:25, 12.30s/it] + +{'loss': 0.4867, 'learning_rate': 3.7220475024159743e-06, 'epoch': 0.72} + + 72%|███████▏ | 5346/7378 [18:20:14<6:56:25, 12.30s/it] + 72%|███████▏ | 5347/7378 [18:20:27<6:55:50, 12.28s/it] + +{'loss': 0.4774, 'learning_rate': 3.718630907614582e-06, 'epoch': 0.72} + + 72%|███████▏ | 5347/7378 [18:20:27<6:55:50, 12.28s/it] + 72%|███████▏ | 5348/7378 [18:20:39<6:54:47, 12.26s/it] + +{'loss': 0.51, 'learning_rate': 3.715215523448642e-06, 'epoch': 0.72} + + 72%|███████▏ | 5348/7378 [18:20:39<6:54:47, 12.26s/it] + 72%|███████▏ | 5349/7378 [18:20:51<6:53:38, 12.23s/it] + +{'loss': 0.4402, 'learning_rate': 3.711801350576417e-06, 'epoch': 0.72} + + 72%|███████▏ | 5349/7378 [18:20:51<6:53:38, 12.23s/it] + 73%|███████▎ | 5350/7378 [18:21:04<6:56:33, 12.32s/it] + +{'loss': 0.4928, 'learning_rate': 3.7083883896559326e-06, 'epoch': 0.73} + + 73%|███████▎ | 5350/7378 [18:21:04<6:56:33, 12.32s/it] + 73%|███████▎ | 5351/7378 [18:21:16<7:00:02, 12.43s/it] + +{'loss': 0.4408, 'learning_rate': 3.704976641344985e-06, 'epoch': 0.73} + + 73%|███████▎ | 5351/7378 [18:21:16<7:00:02, 12.43s/it] + 73%|███████▎ | 5352/7378 [18:21:29<6:57:22, 12.36s/it] + +{'loss': 0.4434, 'learning_rate': 3.70156610630114e-06, 'epoch': 0.73} + + 73%|███████▎ | 5352/7378 [18:21:29<6:57:22, 12.36s/it] + 73%|███████▎ | 5353/7378 [18:21:41<6:57:48, 12.38s/it] + +{'loss': 0.4581, 'learning_rate': 3.69815678518172e-06, 'epoch': 0.73} + + 73%|███████▎ | 5353/7378 [18:21:41<6:57:48, 12.38s/it] + 73%|███████▎ | 5354/7378 [18:21:53<6:56:45, 12.35s/it] + +{'loss': 0.4362, 'learning_rate': 3.6947486786438193e-06, 'epoch': 0.73} + + 73%|███████▎ | 5354/7378 [18:21:53<6:56:45, 12.35s/it] + 73%|███████▎ | 5355/7378 [18:22:06<6:55:41, 12.33s/it] + +{'loss': 0.4441, 'learning_rate': 3.6913417873442937e-06, 'epoch': 0.73} + + 73%|███████▎ | 5355/7378 [18:22:06<6:55:41, 12.33s/it] + 73%|███████▎ | 5356/7378 [18:22:18<6:55:04, 12.32s/it] + +{'loss': 0.4342, 'learning_rate': 3.687936111939775e-06, 'epoch': 0.73} + + 73%|███████▎ | 5356/7378 [18:22:18<6:55:04, 12.32s/it] + 73%|███████▎ | 5357/7378 [18:22:30<6:52:52, 12.26s/it] + +{'loss': 0.3715, 'learning_rate': 3.6845316530866493e-06, 'epoch': 0.73} + + 73%|███████▎ | 5357/7378 [18:22:30<6:52:52, 12.26s/it] + 73%|███████▎ | 5358/7378 [18:22:42<6:53:17, 12.28s/it] + +{'loss': 0.5092, 'learning_rate': 3.681128411441074e-06, 'epoch': 0.73} + + 73%|███████▎ | 5358/7378 [18:22:42<6:53:17, 12.28s/it] + 73%|███████▎ | 5359/7378 [18:22:55<6:56:43, 12.38s/it] + +{'loss': 0.3903, 'learning_rate': 3.6777263876589697e-06, 'epoch': 0.73} + + 73%|███████▎ | 5359/7378 [18:22:55<6:56:43, 12.38s/it] + 73%|███████▎ | 5360/7378 [18:23:07<6:57:05, 12.40s/it] + +{'loss': 0.4945, 'learning_rate': 3.67432558239602e-06, 'epoch': 0.73} + + 73%|███████▎ | 5360/7378 [18:23:07<6:57:05, 12.40s/it] + 73%|███████▎ | 5361/7378 [18:23:19<6:54:18, 12.32s/it] + +{'loss': 0.3943, 'learning_rate': 3.6709259963076836e-06, 'epoch': 0.73} + + 73%|███████▎ | 5361/7378 [18:23:19<6:54:18, 12.32s/it] + 73%|███████▎ | 5362/7378 [18:23:32<6:51:46, 12.26s/it] + +{'loss': 0.4333, 'learning_rate': 3.6675276300491738e-06, 'epoch': 0.73} + + 73%|███████▎ | 5362/7378 [18:23:32<6:51:46, 12.26s/it] + 73%|███████▎ | 5363/7378 [18:23:44<6:50:50, 12.23s/it] + +{'loss': 0.496, 'learning_rate': 3.664130484275473e-06, 'epoch': 0.73} + + 73%|███████▎ | 5363/7378 [18:23:44<6:50:50, 12.23s/it] + 73%|███████▎ | 5364/7378 [18:23:56<6:49:45, 12.21s/it] + +{'loss': 0.4262, 'learning_rate': 3.6607345596413247e-06, 'epoch': 0.73} + + 73%|███████▎ | 5364/7378 [18:23:56<6:49:45, 12.21s/it] + 73%|███████▎ | 5365/7378 [18:24:08<6:52:19, 12.29s/it] + +{'loss': 0.4257, 'learning_rate': 3.657339856801245e-06, 'epoch': 0.73} + + 73%|███████▎ | 5365/7378 [18:24:08<6:52:19, 12.29s/it] + 73%|███████▎ | 5366/7378 [18:24:20<6:49:28, 12.21s/it] + +{'loss': 0.5303, 'learning_rate': 3.6539463764095095e-06, 'epoch': 0.73} + + 73%|███████▎ | 5366/7378 [18:24:20<6:49:28, 12.21s/it] + 73%|███████▎ | 5367/7378 [18:24:33<6:50:58, 12.26s/it] + +{'loss': 0.4335, 'learning_rate': 3.6505541191201554e-06, 'epoch': 0.73} + + 73%|███████▎ | 5367/7378 [18:24:33<6:50:58, 12.26s/it] + 73%|███████▎ | 5368/7378 [18:24:45<6:51:40, 12.29s/it] + +{'loss': 0.437, 'learning_rate': 3.647163085586989e-06, 'epoch': 0.73} + + 73%|███████▎ | 5368/7378 [18:24:45<6:51:40, 12.29s/it] + 73%|███████▎ | 5369/7378 [18:24:57<6:51:13, 12.28s/it] + +{'loss': 0.4708, 'learning_rate': 3.6437732764635737e-06, 'epoch': 0.73} + + 73%|███████▎ | 5369/7378 [18:24:57<6:51:13, 12.28s/it] + 73%|███████▎ | 5370/7378 [18:25:10<6:50:32, 12.27s/it] + +{'loss': 0.4554, 'learning_rate': 3.6403846924032502e-06, 'epoch': 0.73} + + 73%|███████▎ | 5370/7378 [18:25:10<6:50:32, 12.27s/it] + 73%|███████▎ | 5371/7378 [18:25:22<6:47:29, 12.18s/it] + +{'loss': 0.4179, 'learning_rate': 3.6369973340591114e-06, 'epoch': 0.73} + + 73%|███████▎ | 5371/7378 [18:25:22<6:47:29, 12.18s/it] + 73%|███████▎ | 5372/7378 [18:25:34<6:52:42, 12.34s/it] + +{'loss': 0.4161, 'learning_rate': 3.6336112020840176e-06, 'epoch': 0.73} + + 73%|███████▎ | 5372/7378 [18:25:34<6:52:42, 12.34s/it] + 73%|███████▎ | 5373/7378 [18:25:47<6:51:16, 12.31s/it] + +{'loss': 0.415, 'learning_rate': 3.630226297130589e-06, 'epoch': 0.73} + + 73%|███████▎ | 5373/7378 [18:25:47<6:51:16, 12.31s/it] + 73%|███████▎ | 5374/7378 [18:25:59<6:53:23, 12.38s/it] + +{'loss': 0.4531, 'learning_rate': 3.6268426198512197e-06, 'epoch': 0.73} + + 73%|███████▎ | 5374/7378 [18:25:59<6:53:23, 12.38s/it] + 73%|███████▎ | 5375/7378 [18:26:11<6:51:26, 12.32s/it] + +{'loss': 0.5452, 'learning_rate': 3.6234601708980576e-06, 'epoch': 0.73} + + 73%|███████▎ | 5375/7378 [18:26:11<6:51:26, 12.32s/it] + 73%|███████▎ | 5376/7378 [18:26:24<6:52:12, 12.35s/it] + +{'loss': 0.4013, 'learning_rate': 3.620078950923016e-06, 'epoch': 0.73} + + 73%|███████▎ | 5376/7378 [18:26:24<6:52:12, 12.35s/it] + 73%|███████▎ | 5377/7378 [18:26:36<6:48:58, 12.26s/it] + +{'loss': 0.4174, 'learning_rate': 3.6166989605777727e-06, 'epoch': 0.73} + + 73%|███████▎ | 5377/7378 [18:26:36<6:48:58, 12.26s/it] + 73%|███████▎ | 5378/7378 [18:26:48<6:46:50, 12.21s/it] + +{'loss': 0.4704, 'learning_rate': 3.6133202005137647e-06, 'epoch': 0.73} + + 73%|███████▎ | 5378/7378 [18:26:48<6:46:50, 12.21s/it] + 73%|███████▎ | 5379/7378 [18:27:00<6:50:27, 12.32s/it] + +{'loss': 0.4366, 'learning_rate': 3.6099426713822006e-06, 'epoch': 0.73} + + 73%|███████▎ | 5379/7378 [18:27:00<6:50:27, 12.32s/it] + 73%|███████▎ | 5380/7378 [18:27:13<6:52:01, 12.37s/it] + +{'loss': 0.4773, 'learning_rate': 3.606566373834044e-06, 'epoch': 0.73} + + 73%|███████▎ | 5380/7378 [18:27:13<6:52:01, 12.37s/it] + 73%|███████▎ | 5381/7378 [18:27:25<6:50:42, 12.34s/it] + +{'loss': 0.4353, 'learning_rate': 3.6031913085200222e-06, 'epoch': 0.73} + + 73%|███████▎ | 5381/7378 [18:27:25<6:50:42, 12.34s/it] + 73%|███████▎ | 5382/7378 [18:27:41<7:24:20, 13.36s/it] + +{'loss': 0.4531, 'learning_rate': 3.5998174760906233e-06, 'epoch': 0.73} + + 73%|███████▎ | 5382/7378 [18:27:41<7:24:20, 13.36s/it] + 73%|███████▎ | 5383/7378 [18:27:53<7:15:03, 13.08s/it] + +{'loss': 0.4726, 'learning_rate': 3.596444877196109e-06, 'epoch': 0.73} + + 73%|███████▎ | 5383/7378 [18:27:53<7:15:03, 13.08s/it] + 73%|███████▎ | 5384/7378 [18:28:13<8:15:51, 14.92s/it] + +{'loss': 0.397, 'learning_rate': 3.593073512486489e-06, 'epoch': 0.73} + + 73%|███████▎ | 5384/7378 [18:28:13<8:15:51, 14.92s/it] + 73%|███████▎ | 5385/7378 [18:28:25<7:50:43, 14.17s/it] + +{'loss': 0.4377, 'learning_rate': 3.5897033826115424e-06, 'epoch': 0.73} + + 73%|███████▎ | 5385/7378 [18:28:25<7:50:43, 14.17s/it] + 73%|███████▎ | 5386/7378 [18:28:37<7:33:09, 13.65s/it] + +{'loss': 0.4627, 'learning_rate': 3.5863344882208084e-06, 'epoch': 0.73} + + 73%|███████▎ | 5386/7378 [18:28:37<7:33:09, 13.65s/it] + 73%|███████▎ | 5387/7378 [18:28:54<7:58:37, 14.42s/it] + +{'loss': 0.4233, 'learning_rate': 3.5829668299635856e-06, 'epoch': 0.73} + + 73%|███████▎ | 5387/7378 [18:28:54<7:58:37, 14.42s/it] + 73%|███████▎ | 5388/7378 [18:29:06<7:39:46, 13.86s/it] + +{'loss': 0.3966, 'learning_rate': 3.5796004084889436e-06, 'epoch': 0.73} + + 73%|███████▎ | 5388/7378 [18:29:06<7:39:46, 13.86s/it] + 73%|███████▎ | 5389/7378 [18:29:21<7:49:48, 14.17s/it] + +{'loss': 0.4073, 'learning_rate': 3.5762352244457045e-06, 'epoch': 0.73} + + 73%|███████▎ | 5389/7378 [18:29:21<7:49:48, 14.17s/it] + 73%|███████▎ | 5390/7378 [18:29:33<7:27:53, 13.52s/it] + +{'loss': 0.4674, 'learning_rate': 3.572871278482455e-06, 'epoch': 0.73} + + 73%|███████▎ | 5390/7378 [18:29:33<7:27:53, 13.52s/it] + 73%|███████▎ | 5391/7378 [18:29:48<7:45:04, 14.04s/it] + +{'loss': 0.4028, 'learning_rate': 3.5695085712475417e-06, 'epoch': 0.73} + + 73%|███████▎ | 5391/7378 [18:29:48<7:45:04, 14.04s/it] + 73%|███████▎ | 5392/7378 [18:30:00<7:25:18, 13.45s/it] + +{'loss': 0.3955, 'learning_rate': 3.5661471033890714e-06, 'epoch': 0.73} + + 73%|███████▎ | 5392/7378 [18:30:00<7:25:18, 13.45s/it] + 73%|███████▎ | 5393/7378 [18:30:13<7:14:06, 13.12s/it] + +{'loss': 0.3919, 'learning_rate': 3.562786875554918e-06, 'epoch': 0.73} + + 73%|███████▎ | 5393/7378 [18:30:13<7:14:06, 13.12s/it] + 73%|███████▎ | 5394/7378 [18:30:25<7:02:53, 12.79s/it] + +{'loss': 0.3862, 'learning_rate': 3.559427888392716e-06, 'epoch': 0.73} + + 73%|███████▎ | 5394/7378 [18:30:25<7:02:53, 12.79s/it] + 73%|███████▎ | 5395/7378 [18:30:37<6:59:12, 12.68s/it] + +{'loss': 0.3996, 'learning_rate': 3.5560701425498536e-06, 'epoch': 0.73} + + 73%|███████▎ | 5395/7378 [18:30:37<6:59:12, 12.68s/it] + 73%|███████▎ | 5396/7378 [18:30:49<6:52:30, 12.49s/it] + +{'loss': 0.4218, 'learning_rate': 3.5527136386734827e-06, 'epoch': 0.73} + + 73%|███████▎ | 5396/7378 [18:30:49<6:52:30, 12.49s/it] + 73%|███████▎ | 5397/7378 [18:31:02<6:49:58, 12.42s/it] + +{'loss': 0.4903, 'learning_rate': 3.5493583774105157e-06, 'epoch': 0.73} + + 73%|███████▎ | 5397/7378 [18:31:02<6:49:58, 12.42s/it] + 73%|███████▎ | 5398/7378 [18:31:14<6:52:58, 12.51s/it] + +{'loss': 0.4528, 'learning_rate': 3.546004359407632e-06, 'epoch': 0.73} + + 73%|███████▎ | 5398/7378 [18:31:14<6:52:58, 12.51s/it] + 73%|███████▎ | 5399/7378 [18:31:27<6:53:44, 12.54s/it] + +{'loss': 0.5092, 'learning_rate': 3.5426515853112643e-06, 'epoch': 0.73} + + 73%|███████▎ | 5399/7378 [18:31:27<6:53:44, 12.54s/it] + 73%|███████▎ | 5400/7378 [18:31:39<6:47:45, 12.37s/it] + +{'loss': 0.3735, 'learning_rate': 3.5393000557676037e-06, 'epoch': 0.73} + + 73%|███████▎ | 5400/7378 [18:31:39<6:47:45, 12.37s/it] + 73%|███████▎ | 5401/7378 [18:31:51<6:44:27, 12.28s/it] + +{'loss': 0.3824, 'learning_rate': 3.5359497714226086e-06, 'epoch': 0.73} + + 73%|███████▎ | 5401/7378 [18:31:51<6:44:27, 12.28s/it] + 73%|███████▎ | 5402/7378 [18:32:03<6:43:06, 12.24s/it] + +{'loss': 0.4473, 'learning_rate': 3.532600732921989e-06, 'epoch': 0.73} + + 73%|███████▎ | 5402/7378 [18:32:03<6:43:06, 12.24s/it] + 73%|███████▎ | 5403/7378 [18:32:16<6:47:12, 12.37s/it] + +{'loss': 0.5177, 'learning_rate': 3.5292529409112264e-06, 'epoch': 0.73} + + 73%|███████▎ | 5403/7378 [18:32:16<6:47:12, 12.37s/it] + 73%|███████▎ | 5404/7378 [18:32:28<6:45:42, 12.33s/it] + +{'loss': 0.4734, 'learning_rate': 3.525906396035552e-06, 'epoch': 0.73} + + 73%|███████▎ | 5404/7378 [18:32:28<6:45:42, 12.33s/it] + 73%|███████▎ | 5405/7378 [18:32:40<6:45:33, 12.33s/it] + +{'loss': 0.4057, 'learning_rate': 3.5225610989399593e-06, 'epoch': 0.73} + + 73%|███████▎ | 5405/7378 [18:32:40<6:45:33, 12.33s/it] + 73%|███████▎ | 5406/7378 [18:32:53<6:45:54, 12.35s/it] + +{'loss': 0.4372, 'learning_rate': 3.5192170502691993e-06, 'epoch': 0.73} + + 73%|███████▎ | 5406/7378 [18:32:53<6:45:54, 12.35s/it] + 73%|███████▎ | 5407/7378 [18:33:05<6:44:56, 12.33s/it] + +{'loss': 0.4485, 'learning_rate': 3.515874250667791e-06, 'epoch': 0.73} + + 73%|███████▎ | 5407/7378 [18:33:05<6:44:56, 12.33s/it] + 73%|███████▎ | 5408/7378 [18:33:17<6:43:35, 12.29s/it] + +{'loss': 0.4834, 'learning_rate': 3.5125327007800037e-06, 'epoch': 0.73} + + 73%|███████▎ | 5408/7378 [18:33:17<6:43:35, 12.29s/it] + 73%|███████▎ | 5409/7378 [18:33:29<6:41:34, 12.24s/it] + +{'loss': 0.4304, 'learning_rate': 3.509192401249869e-06, 'epoch': 0.73} + + 73%|███████▎ | 5409/7378 [18:33:29<6:41:34, 12.24s/it] + 73%|███████▎ | 5410/7378 [18:33:42<6:43:30, 12.30s/it] + +{'loss': 0.432, 'learning_rate': 3.505853352721177e-06, 'epoch': 0.73} + + 73%|███████▎ | 5410/7378 [18:33:42<6:43:30, 12.30s/it] + 73%|███████▎ | 5411/7378 [18:33:54<6:42:28, 12.28s/it] + +{'loss': 0.3695, 'learning_rate': 3.5025155558374735e-06, 'epoch': 0.73} + + 73%|███████▎ | 5411/7378 [18:33:54<6:42:28, 12.28s/it] + 73%|███████▎ | 5412/7378 [18:34:06<6:38:02, 12.15s/it] + +{'loss': 0.5108, 'learning_rate': 3.499179011242073e-06, 'epoch': 0.73} + + 73%|███████▎ | 5412/7378 [18:34:06<6:38:02, 12.15s/it] + 73%|███████▎ | 5413/7378 [18:34:18<6:35:57, 12.09s/it] + +{'loss': 0.3921, 'learning_rate': 3.4958437195780394e-06, 'epoch': 0.73} + + 73%|███████▎ | 5413/7378 [18:34:18<6:35:57, 12.09s/it] + 73%|███████▎ | 5414/7378 [18:34:30<6:34:38, 12.06s/it] + +{'loss': 0.4187, 'learning_rate': 3.4925096814881988e-06, 'epoch': 0.73} + + 73%|███████▎ | 5414/7378 [18:34:30<6:34:38, 12.06s/it] + 73%|███████▎ | 5415/7378 [18:34:42<6:39:32, 12.21s/it] + +{'loss': 0.4298, 'learning_rate': 3.4891768976151284e-06, 'epoch': 0.73} + + 73%|███████▎ | 5415/7378 [18:34:42<6:39:32, 12.21s/it] + 73%|███████▎ | 5416/7378 [18:34:55<6:45:55, 12.41s/it] + +{'loss': 0.4352, 'learning_rate': 3.4858453686011808e-06, 'epoch': 0.73} + + 73%|███████▎ | 5416/7378 [18:34:55<6:45:55, 12.41s/it] + 73%|███████▎ | 5417/7378 [18:35:08<6:48:09, 12.49s/it] + +{'loss': 0.4653, 'learning_rate': 3.482515095088449e-06, 'epoch': 0.73} + + 73%|███████▎ | 5417/7378 [18:35:08<6:48:09, 12.49s/it] + 73%|███████▎ | 5418/7378 [18:35:20<6:46:28, 12.44s/it] + +{'loss': 0.4217, 'learning_rate': 3.4791860777187924e-06, 'epoch': 0.73} + + 73%|███████▎ | 5418/7378 [18:35:20<6:46:28, 12.44s/it] + 73%|███████▎ | 5419/7378 [18:35:33<6:46:03, 12.44s/it] + +{'loss': 0.4642, 'learning_rate': 3.4758583171338277e-06, 'epoch': 0.73} + + 73%|███████▎ | 5419/7378 [18:35:33<6:46:03, 12.44s/it] + 73%|███████▎ | 5420/7378 [18:35:45<6:41:23, 12.30s/it] + +{'loss': 0.4413, 'learning_rate': 3.4725318139749255e-06, 'epoch': 0.73} + + 73%|███████▎ | 5420/7378 [18:35:45<6:41:23, 12.30s/it] + 73%|███████▎ | 5421/7378 [18:35:57<6:42:01, 12.33s/it] + +{'loss': 0.3877, 'learning_rate': 3.4692065688832223e-06, 'epoch': 0.73} + + 73%|███████▎ | 5421/7378 [18:35:57<6:42:01, 12.33s/it] + 73%|███████▎ | 5422/7378 [18:36:09<6:42:06, 12.33s/it] + +{'loss': 0.4876, 'learning_rate': 3.4658825824996036e-06, 'epoch': 0.73} + + 73%|███████▎ | 5422/7378 [18:36:09<6:42:06, 12.33s/it] + 74%|███████▎ | 5423/7378 [18:36:22<6:42:12, 12.34s/it] + +{'loss': 0.4138, 'learning_rate': 3.4625598554647177e-06, 'epoch': 0.74} + + 74%|███████▎ | 5423/7378 [18:36:22<6:42:12, 12.34s/it] + 74%|███████▎ | 5424/7378 [18:36:34<6:40:55, 12.31s/it] + +{'loss': 0.3667, 'learning_rate': 3.459238388418963e-06, 'epoch': 0.74} + + 74%|███████▎ | 5424/7378 [18:36:34<6:40:55, 12.31s/it] + 74%|███████▎ | 5425/7378 [18:36:46<6:38:17, 12.24s/it] + +{'loss': 0.4651, 'learning_rate': 3.4559181820025067e-06, 'epoch': 0.74} + + 74%|███████▎ | 5425/7378 [18:36:46<6:38:17, 12.24s/it] + 74%|███████▎ | 5426/7378 [18:36:58<6:40:01, 12.30s/it] + +{'loss': 0.4038, 'learning_rate': 3.4525992368552652e-06, 'epoch': 0.74} + + 74%|███████▎ | 5426/7378 [18:36:58<6:40:01, 12.30s/it] + 74%|███████▎ | 5427/7378 [18:37:11<6:38:44, 12.26s/it] + +{'loss': 0.4663, 'learning_rate': 3.449281553616911e-06, 'epoch': 0.74} + + 74%|███████▎ | 5427/7378 [18:37:11<6:38:44, 12.26s/it] + 74%|███████▎ | 5428/7378 [18:37:23<6:40:43, 12.33s/it] + +{'loss': 0.3642, 'learning_rate': 3.445965132926877e-06, 'epoch': 0.74} + + 74%|███████▎ | 5428/7378 [18:37:23<6:40:43, 12.33s/it] + 74%|███████▎ | 5429/7378 [18:37:35<6:36:34, 12.21s/it] + +{'loss': 0.4146, 'learning_rate': 3.442649975424347e-06, 'epoch': 0.74} + + 74%|███████▎ | 5429/7378 [18:37:35<6:36:34, 12.21s/it] + 74%|███████▎ | 5430/7378 [18:37:48<6:38:49, 12.28s/it] + +{'loss': 0.4201, 'learning_rate': 3.4393360817482733e-06, 'epoch': 0.74} + + 74%|███████▎ | 5430/7378 [18:37:48<6:38:49, 12.28s/it] + 74%|███████▎ | 5431/7378 [18:38:00<6:36:48, 12.23s/it] + +{'loss': 0.4614, 'learning_rate': 3.4360234525373528e-06, 'epoch': 0.74} + + 74%|███████▎ | 5431/7378 [18:38:00<6:36:48, 12.23s/it] + 74%|███████▎ | 5432/7378 [18:38:12<6:38:08, 12.28s/it] + +{'loss': 0.4945, 'learning_rate': 3.4327120884300437e-06, 'epoch': 0.74} + + 74%|███████▎ | 5432/7378 [18:38:12<6:38:08, 12.28s/it] + 74%|███████▎ | 5433/7378 [18:38:24<6:37:47, 12.27s/it] + +{'loss': 0.4106, 'learning_rate': 3.429401990064555e-06, 'epoch': 0.74} + + 74%|███████▎ | 5433/7378 [18:38:24<6:37:47, 12.27s/it] + 74%|███████▎ | 5434/7378 [18:38:37<6:38:34, 12.30s/it] + +{'loss': 0.4282, 'learning_rate': 3.4260931580788635e-06, 'epoch': 0.74} + + 74%|███████▎ | 5434/7378 [18:38:37<6:38:34, 12.30s/it] + 74%|███████▎ | 5435/7378 [18:38:49<6:35:07, 12.20s/it] + +{'loss': 0.4375, 'learning_rate': 3.422785593110692e-06, 'epoch': 0.74} + + 74%|███████▎ | 5435/7378 [18:38:49<6:35:07, 12.20s/it] + 74%|███████▎ | 5436/7378 [18:39:01<6:35:07, 12.21s/it] + +{'loss': 0.4928, 'learning_rate': 3.419479295797522e-06, 'epoch': 0.74} + + 74%|███████▎ | 5436/7378 [18:39:01<6:35:07, 12.21s/it] + 74%|███████▎ | 5437/7378 [18:39:14<6:44:15, 12.50s/it] + +{'loss': 0.484, 'learning_rate': 3.4161742667765853e-06, 'epoch': 0.74} + + 74%|███████▎ | 5437/7378 [18:39:14<6:44:15, 12.50s/it] + 74%|███████▎ | 5438/7378 [18:39:26<6:40:54, 12.40s/it] + +{'loss': 0.4822, 'learning_rate': 3.4128705066848832e-06, 'epoch': 0.74} + + 74%|███████▎ | 5438/7378 [18:39:26<6:40:54, 12.40s/it] + 74%|███████▎ | 5439/7378 [18:39:38<6:38:51, 12.34s/it] + +{'loss': 0.43, 'learning_rate': 3.409568016159155e-06, 'epoch': 0.74} + + 74%|███████▎ | 5439/7378 [18:39:38<6:38:51, 12.34s/it] + 74%|███████▎ | 5440/7378 [18:39:51<6:38:47, 12.35s/it] + +{'loss': 0.4322, 'learning_rate': 3.406266795835913e-06, 'epoch': 0.74} + + 74%|███████▎ | 5440/7378 [18:39:51<6:38:47, 12.35s/it] + 74%|███████▎ | 5441/7378 [18:40:03<6:36:52, 12.29s/it] + +{'loss': 0.3945, 'learning_rate': 3.40296684635141e-06, 'epoch': 0.74} + + 74%|███████▎ | 5441/7378 [18:40:03<6:36:52, 12.29s/it] + 74%|███████▍ | 5442/7378 [18:40:15<6:35:05, 12.24s/it] + +{'loss': 0.375, 'learning_rate': 3.399668168341662e-06, 'epoch': 0.74} + + 74%|███████▍ | 5442/7378 [18:40:15<6:35:05, 12.24s/it] + 74%|███████▍ | 5443/7378 [18:40:27<6:35:20, 12.26s/it] + +{'loss': 0.4399, 'learning_rate': 3.3963707624424314e-06, 'epoch': 0.74} + + 74%|███████▍ | 5443/7378 [18:40:27<6:35:20, 12.26s/it] + 74%|███████▍ | 5444/7378 [18:40:40<6:37:00, 12.32s/it] + +{'loss': 0.4623, 'learning_rate': 3.3930746292892503e-06, 'epoch': 0.74} + + 74%|███████▍ | 5444/7378 [18:40:40<6:37:00, 12.32s/it] + 74%|███████▍ | 5445/7378 [18:40:52<6:36:59, 12.32s/it] + +{'loss': 0.4793, 'learning_rate': 3.389779769517393e-06, 'epoch': 0.74} + + 74%|███████▍ | 5445/7378 [18:40:52<6:36:59, 12.32s/it] + 74%|███████▍ | 5446/7378 [18:41:05<6:38:12, 12.37s/it] + +{'loss': 0.4377, 'learning_rate': 3.3864861837618914e-06, 'epoch': 0.74} + + 74%|███████▍ | 5446/7378 [18:41:05<6:38:12, 12.37s/it] + 74%|███████▍ | 5447/7378 [18:41:17<6:34:26, 12.26s/it] + +{'loss': 0.4126, 'learning_rate': 3.383193872657533e-06, 'epoch': 0.74} + + 74%|███████▍ | 5447/7378 [18:41:17<6:34:26, 12.26s/it] + 74%|███████▍ | 5448/7378 [18:41:29<6:33:16, 12.23s/it] + +{'loss': 0.4582, 'learning_rate': 3.3799028368388554e-06, 'epoch': 0.74} + + 74%|███████▍ | 5448/7378 [18:41:29<6:33:16, 12.23s/it] + 74%|███████▍ | 5449/7378 [18:41:41<6:30:32, 12.15s/it] + +{'loss': 0.4457, 'learning_rate': 3.3766130769401617e-06, 'epoch': 0.74} + + 74%|███████▍ | 5449/7378 [18:41:41<6:30:32, 12.15s/it] + 74%|███████▍ | 5450/7378 [18:41:53<6:32:07, 12.20s/it] + +{'loss': 0.4237, 'learning_rate': 3.3733245935954973e-06, 'epoch': 0.74} + + 74%|███████▍ | 5450/7378 [18:41:53<6:32:07, 12.20s/it] + 74%|███████▍ | 5451/7378 [18:42:06<6:34:53, 12.30s/it] + +{'loss': 0.4118, 'learning_rate': 3.370037387438667e-06, 'epoch': 0.74} + + 74%|███████▍ | 5451/7378 [18:42:06<6:34:53, 12.30s/it] + 74%|███████▍ | 5452/7378 [18:42:18<6:32:44, 12.23s/it] + +{'loss': 0.3984, 'learning_rate': 3.366751459103227e-06, 'epoch': 0.74} + + 74%|███████▍ | 5452/7378 [18:42:18<6:32:44, 12.23s/it] + 74%|███████▍ | 5453/7378 [18:42:29<6:28:08, 12.10s/it] + +{'loss': 0.4925, 'learning_rate': 3.3634668092224853e-06, 'epoch': 0.74} + + 74%|███████▍ | 5453/7378 [18:42:29<6:28:08, 12.10s/it] + 74%|███████▍ | 5454/7378 [18:42:42<6:28:01, 12.10s/it] + +{'loss': 0.4993, 'learning_rate': 3.360183438429514e-06, 'epoch': 0.74} + + 74%|███████▍ | 5454/7378 [18:42:42<6:28:01, 12.10s/it] + 74%|███████▍ | 5455/7378 [18:42:54<6:28:45, 12.13s/it] + +{'loss': 0.387, 'learning_rate': 3.3569013473571276e-06, 'epoch': 0.74} + + 74%|███████▍ | 5455/7378 [18:42:54<6:28:45, 12.13s/it] + 74%|███████▍ | 5456/7378 [18:43:06<6:33:50, 12.29s/it] + +{'loss': 0.4664, 'learning_rate': 3.3536205366378983e-06, 'epoch': 0.74} + + 74%|███████▍ | 5456/7378 [18:43:06<6:33:50, 12.29s/it] + 74%|███████▍ | 5457/7378 [18:43:19<6:35:06, 12.34s/it] + +{'loss': 0.4359, 'learning_rate': 3.3503410069041473e-06, 'epoch': 0.74} + + 74%|███████▍ | 5457/7378 [18:43:19<6:35:06, 12.34s/it] + 74%|███████▍ | 5458/7378 [18:43:31<6:31:14, 12.23s/it] + +{'loss': 0.4285, 'learning_rate': 3.347062758787959e-06, 'epoch': 0.74} + + 74%|███████▍ | 5458/7378 [18:43:31<6:31:14, 12.23s/it] + 74%|███████▍ | 5459/7378 [18:43:43<6:32:15, 12.26s/it] + +{'loss': 0.4728, 'learning_rate': 3.3437857929211604e-06, 'epoch': 0.74} + + 74%|███████▍ | 5459/7378 [18:43:43<6:32:15, 12.26s/it] + 74%|███████▍ | 5460/7378 [18:43:55<6:27:34, 12.12s/it] + +{'loss': 0.441, 'learning_rate': 3.3405101099353367e-06, 'epoch': 0.74} + + 74%|███████▍ | 5460/7378 [18:43:55<6:27:34, 12.12s/it] + 74%|███████▍ | 5461/7378 [18:44:08<6:33:33, 12.32s/it] + +{'loss': 0.4564, 'learning_rate': 3.3372357104618237e-06, 'epoch': 0.74} + + 74%|███████▍ | 5461/7378 [18:44:08<6:33:33, 12.32s/it] + 74%|███████▍ | 5462/7378 [18:44:20<6:35:01, 12.37s/it] + +{'loss': 0.4053, 'learning_rate': 3.333962595131708e-06, 'epoch': 0.74} + + 74%|███████▍ | 5462/7378 [18:44:20<6:35:01, 12.37s/it] + 74%|███████▍ | 5463/7378 [18:44:32<6:31:30, 12.27s/it] + +{'loss': 0.4297, 'learning_rate': 3.330690764575837e-06, 'epoch': 0.74} + + 74%|███████▍ | 5463/7378 [18:44:32<6:31:30, 12.27s/it] + 74%|███████▍ | 5464/7378 [18:44:45<6:31:55, 12.29s/it] + +{'loss': 0.396, 'learning_rate': 3.3274202194248004e-06, 'epoch': 0.74} + + 74%|███████▍ | 5464/7378 [18:44:45<6:31:55, 12.29s/it] + 74%|███████▍ | 5465/7378 [18:44:57<6:33:44, 12.35s/it] + +{'loss': 0.4792, 'learning_rate': 3.324150960308947e-06, 'epoch': 0.74} + + 74%|███████▍ | 5465/7378 [18:44:57<6:33:44, 12.35s/it] + 74%|███████▍ | 5466/7378 [18:45:10<6:34:19, 12.37s/it] + +{'loss': 0.4532, 'learning_rate': 3.3208829878583714e-06, 'epoch': 0.74} + + 74%|███████▍ | 5466/7378 [18:45:10<6:34:19, 12.37s/it] + 74%|███████▍ | 5467/7378 [18:45:21<6:30:26, 12.26s/it] + +{'loss': 0.4916, 'learning_rate': 3.3176163027029296e-06, 'epoch': 0.74} + + 74%|███████▍ | 5467/7378 [18:45:21<6:30:26, 12.26s/it] + 74%|███████▍ | 5468/7378 [18:45:34<6:27:49, 12.18s/it] + +{'loss': 0.438, 'learning_rate': 3.314350905472221e-06, 'epoch': 0.74} + + 74%|███████▍ | 5468/7378 [18:45:34<6:27:49, 12.18s/it] + 74%|███████▍ | 5469/7378 [18:45:45<6:25:10, 12.11s/it] + +{'loss': 0.4273, 'learning_rate': 3.3110867967955993e-06, 'epoch': 0.74} + + 74%|███████▍ | 5469/7378 [18:45:45<6:25:10, 12.11s/it] + 74%|███████▍ | 5470/7378 [18:45:58<6:27:05, 12.17s/it] + +{'loss': 0.3529, 'learning_rate': 3.3078239773021726e-06, 'epoch': 0.74} + + 74%|███████▍ | 5470/7378 [18:45:58<6:27:05, 12.17s/it] + 74%|███████▍ | 5471/7378 [18:46:10<6:26:56, 12.17s/it] + +{'loss': 0.438, 'learning_rate': 3.3045624476207916e-06, 'epoch': 0.74} + + 74%|███████▍ | 5471/7378 [18:46:10<6:26:56, 12.17s/it] + 74%|███████▍ | 5472/7378 [18:46:22<6:28:28, 12.23s/it] + +{'loss': 0.4357, 'learning_rate': 3.301302208380074e-06, 'epoch': 0.74} + + 74%|███████▍ | 5472/7378 [18:46:22<6:28:28, 12.23s/it] + 74%|███████▍ | 5473/7378 [18:46:34<6:26:03, 12.16s/it] + +{'loss': 0.4069, 'learning_rate': 3.2980432602083754e-06, 'epoch': 0.74} + + 74%|███████▍ | 5473/7378 [18:46:34<6:26:03, 12.16s/it] + 74%|███████▍ | 5474/7378 [18:46:47<6:27:59, 12.23s/it] + +{'loss': 0.4013, 'learning_rate': 3.2947856037338077e-06, 'epoch': 0.74} + + 74%|███████▍ | 5474/7378 [18:46:47<6:27:59, 12.23s/it] + 74%|███████▍ | 5475/7378 [18:46:59<6:25:41, 12.16s/it] + +{'loss': 0.4184, 'learning_rate': 3.29152923958423e-06, 'epoch': 0.74} + + 74%|███████▍ | 5475/7378 [18:46:59<6:25:41, 12.16s/it] + 74%|███████▍ | 5476/7378 [18:47:11<6:25:31, 12.16s/it] + +{'loss': 0.4282, 'learning_rate': 3.288274168387261e-06, 'epoch': 0.74} + + 74%|███████▍ | 5476/7378 [18:47:11<6:25:31, 12.16s/it] + 74%|███████▍ | 5477/7378 [18:47:23<6:23:24, 12.10s/it] + +{'loss': 0.4352, 'learning_rate': 3.2850203907702616e-06, 'epoch': 0.74} + + 74%|███████▍ | 5477/7378 [18:47:23<6:23:24, 12.10s/it] + 74%|███████▍ | 5478/7378 [18:47:35<6:24:50, 12.15s/it] + +{'loss': 0.3554, 'learning_rate': 3.281767907360347e-06, 'epoch': 0.74} + + 74%|███████▍ | 5478/7378 [18:47:35<6:24:50, 12.15s/it] + 74%|███████▍ | 5479/7378 [18:47:47<6:23:46, 12.13s/it] + +{'loss': 0.4862, 'learning_rate': 3.2785167187843825e-06, 'epoch': 0.74} + + 74%|███████▍ | 5479/7378 [18:47:47<6:23:46, 12.13s/it] + 74%|███████▍ | 5480/7378 [18:48:00<6:27:49, 12.26s/it] + +{'loss': 0.4721, 'learning_rate': 3.2752668256689803e-06, 'epoch': 0.74} + + 74%|███████▍ | 5480/7378 [18:48:00<6:27:49, 12.26s/it] + 74%|███████▍ | 5481/7378 [18:48:12<6:27:27, 12.25s/it] + +{'loss': 0.3499, 'learning_rate': 3.2720182286405088e-06, 'epoch': 0.74} + + 74%|███████▍ | 5481/7378 [18:48:12<6:27:27, 12.25s/it] + 74%|███████▍ | 5482/7378 [18:48:25<6:31:43, 12.40s/it] + +{'loss': 0.4642, 'learning_rate': 3.268770928325088e-06, 'epoch': 0.74} + + 74%|███████▍ | 5482/7378 [18:48:25<6:31:43, 12.40s/it] + 74%|███████▍ | 5483/7378 [18:48:37<6:30:11, 12.35s/it] + +{'loss': 0.4866, 'learning_rate': 3.265524925348582e-06, 'epoch': 0.74} + + 74%|███████▍ | 5483/7378 [18:48:37<6:30:11, 12.35s/it] + 74%|███████▍ | 5484/7378 [18:48:49<6:27:24, 12.27s/it] + +{'loss': 0.4365, 'learning_rate': 3.2622802203366057e-06, 'epoch': 0.74} + + 74%|███████▍ | 5484/7378 [18:48:49<6:27:24, 12.27s/it] + 74%|███████▍ | 5485/7378 [18:49:01<6:26:07, 12.24s/it] + +{'loss': 0.4364, 'learning_rate': 3.2590368139145212e-06, 'epoch': 0.74} + + 74%|███████▍ | 5485/7378 [18:49:01<6:26:07, 12.24s/it] + 74%|███████▍ | 5486/7378 [18:49:14<6:28:17, 12.31s/it] + +{'loss': 0.422, 'learning_rate': 3.2557947067074524e-06, 'epoch': 0.74} + + 74%|███████▍ | 5486/7378 [18:49:14<6:28:17, 12.31s/it] + 74%|███████▍ | 5487/7378 [18:49:26<6:28:10, 12.32s/it] + +{'loss': 0.3714, 'learning_rate': 3.2525538993402605e-06, 'epoch': 0.74} + + 74%|███████▍ | 5487/7378 [18:49:26<6:28:10, 12.32s/it] + 74%|███████▍ | 5488/7378 [18:49:39<6:30:21, 12.39s/it] + +{'loss': 0.4764, 'learning_rate': 3.2493143924375616e-06, 'epoch': 0.74} + + 74%|███████▍ | 5488/7378 [18:49:39<6:30:21, 12.39s/it] + 74%|███████▍ | 5489/7378 [18:49:51<6:31:15, 12.43s/it] + +{'loss': 0.5145, 'learning_rate': 3.2460761866237177e-06, 'epoch': 0.74} + + 74%|███████▍ | 5489/7378 [18:49:51<6:31:15, 12.43s/it] + 74%|███████▍ | 5490/7378 [18:50:03<6:25:51, 12.26s/it] + +{'loss': 0.4304, 'learning_rate': 3.2428392825228405e-06, 'epoch': 0.74} + + 74%|███████▍ | 5490/7378 [18:50:03<6:25:51, 12.26s/it] + 74%|███████▍ | 5491/7378 [18:50:15<6:24:38, 12.23s/it] + +{'loss': 0.3926, 'learning_rate': 3.2396036807587993e-06, 'epoch': 0.74} + + 74%|███████▍ | 5491/7378 [18:50:15<6:24:38, 12.23s/it] + 74%|███████▍ | 5492/7378 [18:50:28<6:27:30, 12.33s/it] + +{'loss': 0.4392, 'learning_rate': 3.236369381955201e-06, 'epoch': 0.74} + + 74%|███████▍ | 5492/7378 [18:50:28<6:27:30, 12.33s/it] + 74%|███████▍ | 5493/7378 [18:50:40<6:26:41, 12.31s/it] + +{'loss': 0.4299, 'learning_rate': 3.233136386735407e-06, 'epoch': 0.74} + + 74%|███████▍ | 5493/7378 [18:50:40<6:26:41, 12.31s/it] + 74%|███████▍ | 5494/7378 [18:50:52<6:24:58, 12.26s/it] + +{'loss': 0.3709, 'learning_rate': 3.2299046957225233e-06, 'epoch': 0.74} + + 74%|███████▍ | 5494/7378 [18:50:52<6:24:58, 12.26s/it] + 74%|███████▍ | 5495/7378 [18:51:05<6:27:33, 12.35s/it] + +{'loss': 0.488, 'learning_rate': 3.2266743095394124e-06, 'epoch': 0.74} + + 74%|███████▍ | 5495/7378 [18:51:05<6:27:33, 12.35s/it] + 74%|███████▍ | 5496/7378 [18:51:17<6:25:44, 12.30s/it] + +{'loss': 0.4948, 'learning_rate': 3.2234452288086802e-06, 'epoch': 0.74} + + 74%|███████▍ | 5496/7378 [18:51:17<6:25:44, 12.30s/it] + 75%|███████▍ | 5497/7378 [18:51:29<6:24:05, 12.25s/it] + +{'loss': 0.509, 'learning_rate': 3.2202174541526808e-06, 'epoch': 0.75} + + 75%|███████▍ | 5497/7378 [18:51:29<6:24:05, 12.25s/it] + 75%|███████▍ | 5498/7378 [18:51:41<6:24:52, 12.28s/it] + +{'loss': 0.4227, 'learning_rate': 3.2169909861935157e-06, 'epoch': 0.75} + + 75%|███████▍ | 5498/7378 [18:51:41<6:24:52, 12.28s/it] + 75%|███████▍ | 5499/7378 [18:51:54<6:24:29, 12.28s/it] + +{'loss': 0.4576, 'learning_rate': 3.2137658255530325e-06, 'epoch': 0.75} + + 75%|███████▍ | 5499/7378 [18:51:54<6:24:29, 12.28s/it] + 75%|███████▍ | 5500/7378 [18:52:06<6:28:44, 12.42s/it] + +{'loss': 0.5011, 'learning_rate': 3.2105419728528387e-06, 'epoch': 0.75} + + 75%|███████▍ | 5500/7378 [18:52:06<6:28:44, 12.42s/it] + 75%|███████▍ | 5501/7378 [18:52:19<6:27:22, 12.38s/it] + +{'loss': 0.4435, 'learning_rate': 3.2073194287142774e-06, 'epoch': 0.75} + + 75%|███████▍ | 5501/7378 [18:52:19<6:27:22, 12.38s/it] + 75%|███████▍ | 5502/7378 [18:52:31<6:22:32, 12.23s/it] + +{'loss': 0.475, 'learning_rate': 3.2040981937584435e-06, 'epoch': 0.75} + + 75%|███████▍ | 5502/7378 [18:52:31<6:22:32, 12.23s/it] + 75%|███████▍ | 5503/7378 [18:52:43<6:22:00, 12.22s/it] + +{'loss': 0.4374, 'learning_rate': 3.200878268606179e-06, 'epoch': 0.75} + + 75%|███████▍ | 5503/7378 [18:52:43<6:22:00, 12.22s/it] + 75%|███████▍ | 5504/7378 [18:52:55<6:22:09, 12.24s/it] + +{'loss': 0.4214, 'learning_rate': 3.197659653878071e-06, 'epoch': 0.75} + + 75%|███████▍ | 5504/7378 [18:52:55<6:22:09, 12.24s/it] + 75%|███████▍ | 5505/7378 [18:53:07<6:21:59, 12.24s/it] + +{'loss': 0.414, 'learning_rate': 3.1944423501944643e-06, 'epoch': 0.75} + + 75%|███████▍ | 5505/7378 [18:53:07<6:21:59, 12.24s/it] + 75%|███████▍ | 5506/7378 [18:53:19<6:21:34, 12.23s/it] + +{'loss': 0.448, 'learning_rate': 3.1912263581754397e-06, 'epoch': 0.75} + + 75%|███████▍ | 5506/7378 [18:53:19<6:21:34, 12.23s/it] + 75%|███████▍ | 5507/7378 [18:53:32<6:24:09, 12.32s/it] + +{'loss': 0.4111, 'learning_rate': 3.18801167844083e-06, 'epoch': 0.75} + + 75%|███████▍ | 5507/7378 [18:53:32<6:24:09, 12.32s/it] + 75%|███████▍ | 5508/7378 [18:53:44<6:25:04, 12.36s/it] + +{'loss': 0.4248, 'learning_rate': 3.184798311610211e-06, 'epoch': 0.75} + + 75%|███████▍ | 5508/7378 [18:53:44<6:25:04, 12.36s/it] + 75%|███████▍ | 5509/7378 [18:53:57<6:22:57, 12.29s/it] + +{'loss': 0.4464, 'learning_rate': 3.1815862583029143e-06, 'epoch': 0.75} + + 75%|███████▍ | 5509/7378 [18:53:57<6:22:57, 12.29s/it] + 75%|███████▍ | 5510/7378 [18:54:09<6:20:46, 12.23s/it] + +{'loss': 0.4496, 'learning_rate': 3.1783755191380094e-06, 'epoch': 0.75} + + 75%|███████▍ | 5510/7378 [18:54:09<6:20:46, 12.23s/it] + 75%|███████▍ | 5511/7378 [18:54:21<6:20:24, 12.23s/it] + +{'loss': 0.3935, 'learning_rate': 3.1751660947343176e-06, 'epoch': 0.75} + + 75%|███████▍ | 5511/7378 [18:54:21<6:20:24, 12.23s/it] + 75%|███████▍ | 5512/7378 [18:54:33<6:19:30, 12.20s/it] + +{'loss': 0.4053, 'learning_rate': 3.1719579857104042e-06, 'epoch': 0.75} + + 75%|███████▍ | 5512/7378 [18:54:33<6:19:30, 12.20s/it] + 75%|███████▍ | 5513/7378 [18:54:45<6:20:35, 12.24s/it] + +{'loss': 0.3991, 'learning_rate': 3.1687511926845793e-06, 'epoch': 0.75} + + 75%|███████▍ | 5513/7378 [18:54:45<6:20:35, 12.24s/it] + 75%|███████▍ | 5514/7378 [18:54:57<6:19:12, 12.21s/it] + +{'loss': 0.4536, 'learning_rate': 3.165545716274908e-06, 'epoch': 0.75} + + 75%|███████▍ | 5514/7378 [18:54:57<6:19:12, 12.21s/it] + 75%|███████▍ | 5515/7378 [18:55:10<6:19:26, 12.22s/it] + +{'loss': 0.5318, 'learning_rate': 3.1623415570991923e-06, 'epoch': 0.75} + + 75%|███████▍ | 5515/7378 [18:55:10<6:19:26, 12.22s/it] + 75%|███████▍ | 5516/7378 [18:55:22<6:17:03, 12.15s/it] + +{'loss': 0.4087, 'learning_rate': 3.159138715774983e-06, 'epoch': 0.75} + + 75%|███████▍ | 5516/7378 [18:55:22<6:17:03, 12.15s/it] + 75%|███████▍ | 5517/7378 [18:55:34<6:22:09, 12.32s/it] + +{'loss': 0.4268, 'learning_rate': 3.1559371929195758e-06, 'epoch': 0.75} + + 75%|███████▍ | 5517/7378 [18:55:34<6:22:09, 12.32s/it] + 75%|███████▍ | 5518/7378 [18:55:47<6:22:10, 12.33s/it] + +{'loss': 0.4561, 'learning_rate': 3.1527369891500194e-06, 'epoch': 0.75} + + 75%|███████▍ | 5518/7378 [18:55:47<6:22:10, 12.33s/it] + 75%|███████▍ | 5519/7378 [18:55:59<6:19:05, 12.24s/it] + +{'loss': 0.4309, 'learning_rate': 3.149538105083101e-06, 'epoch': 0.75} + + 75%|███████▍ | 5519/7378 [18:55:59<6:19:05, 12.24s/it] + 75%|███████▍ | 5520/7378 [18:56:12<6:25:26, 12.45s/it] + +{'loss': 0.4544, 'learning_rate': 3.1463405413353533e-06, 'epoch': 0.75} + + 75%|███████▍ | 5520/7378 [18:56:12<6:25:26, 12.45s/it] + 75%|███████▍ | 5521/7378 [18:56:24<6:26:29, 12.49s/it] + +{'loss': 0.4553, 'learning_rate': 3.1431442985230585e-06, 'epoch': 0.75} + + 75%|███████▍ | 5521/7378 [18:56:24<6:26:29, 12.49s/it] + 75%|███████▍ | 5522/7378 [18:56:36<6:23:14, 12.39s/it] + +{'loss': 0.4293, 'learning_rate': 3.139949377262238e-06, 'epoch': 0.75} + + 75%|███████▍ | 5522/7378 [18:56:36<6:23:14, 12.39s/it] + 75%|███████▍ | 5523/7378 [18:56:49<6:21:49, 12.35s/it] + +{'loss': 0.502, 'learning_rate': 3.1367557781686697e-06, 'epoch': 0.75} + + 75%|███████▍ | 5523/7378 [18:56:49<6:21:49, 12.35s/it] + 75%|███████▍ | 5524/7378 [18:57:01<6:22:34, 12.38s/it] + +{'loss': 0.4439, 'learning_rate': 3.1335635018578635e-06, 'epoch': 0.75} + + 75%|███████▍ | 5524/7378 [18:57:01<6:22:34, 12.38s/it] + 75%|███████▍ | 5525/7378 [18:57:13<6:18:48, 12.27s/it] + +{'loss': 0.3821, 'learning_rate': 3.1303725489450864e-06, 'epoch': 0.75} + + 75%|███████▍ | 5525/7378 [18:57:13<6:18:48, 12.27s/it] + 75%|███████▍ | 5526/7378 [18:57:25<6:16:40, 12.20s/it] + +{'loss': 0.4458, 'learning_rate': 3.127182920045343e-06, 'epoch': 0.75} + + 75%|███████▍ | 5526/7378 [18:57:25<6:16:40, 12.20s/it] + 75%|███████▍ | 5527/7378 [18:57:38<6:20:56, 12.35s/it] + +{'loss': 0.3978, 'learning_rate': 3.123994615773378e-06, 'epoch': 0.75} + + 75%|███████▍ | 5527/7378 [18:57:38<6:20:56, 12.35s/it] + 75%|███████▍ | 5528/7378 [18:57:50<6:18:44, 12.28s/it] + +{'loss': 0.4628, 'learning_rate': 3.1208076367436966e-06, 'epoch': 0.75} + + 75%|███████▍ | 5528/7378 [18:57:50<6:18:44, 12.28s/it] + 75%|███████▍ | 5529/7378 [18:58:02<6:15:24, 12.18s/it] + +{'loss': 0.4682, 'learning_rate': 3.1176219835705345e-06, 'epoch': 0.75} + + 75%|███████▍ | 5529/7378 [18:58:02<6:15:24, 12.18s/it] + 75%|███████▍ | 5530/7378 [18:58:14<6:14:19, 12.15s/it] + +{'loss': 0.473, 'learning_rate': 3.1144376568678767e-06, 'epoch': 0.75} + + 75%|███████▍ | 5530/7378 [18:58:14<6:14:19, 12.15s/it] + 75%|███████▍ | 5531/7378 [18:58:26<6:12:20, 12.10s/it] + +{'loss': 0.5021, 'learning_rate': 3.1112546572494515e-06, 'epoch': 0.75} + + 75%|███████▍ | 5531/7378 [18:58:26<6:12:20, 12.10s/it] + 75%|███████▍ | 5532/7378 [18:58:38<6:13:33, 12.14s/it] + +{'loss': 0.4411, 'learning_rate': 3.1080729853287293e-06, 'epoch': 0.75} + + 75%|███████▍ | 5532/7378 [18:58:38<6:13:33, 12.14s/it] + 75%|███████▍ | 5533/7378 [18:58:50<6:13:19, 12.14s/it] + +{'loss': 0.4808, 'learning_rate': 3.1048926417189353e-06, 'epoch': 0.75} + + 75%|███████▍ | 5533/7378 [18:58:50<6:13:19, 12.14s/it] + 75%|███████▌ | 5534/7378 [18:59:02<6:11:35, 12.09s/it] + +{'loss': 0.4096, 'learning_rate': 3.101713627033026e-06, 'epoch': 0.75} + + 75%|███████▌ | 5534/7378 [18:59:02<6:11:35, 12.09s/it] + 75%|███████▌ | 5535/7378 [18:59:15<6:14:45, 12.20s/it] + +{'loss': 0.3971, 'learning_rate': 3.098535941883708e-06, 'epoch': 0.75} + + 75%|███████▌ | 5535/7378 [18:59:15<6:14:45, 12.20s/it] + 75%|███████▌ | 5536/7378 [18:59:28<6:19:59, 12.38s/it] + +{'loss': 0.4033, 'learning_rate': 3.095359586883425e-06, 'epoch': 0.75} + + 75%|███████▌ | 5536/7378 [18:59:28<6:19:59, 12.38s/it] + 75%|███████▌ | 5537/7378 [18:59:40<6:20:33, 12.40s/it] + +{'loss': 0.4345, 'learning_rate': 3.092184562644378e-06, 'epoch': 0.75} + + 75%|███████▌ | 5537/7378 [18:59:40<6:20:33, 12.40s/it] + 75%|███████▌ | 5538/7378 [18:59:52<6:19:54, 12.39s/it] + +{'loss': 0.3803, 'learning_rate': 3.0890108697785003e-06, 'epoch': 0.75} + + 75%|███████▌ | 5538/7378 [18:59:52<6:19:54, 12.39s/it] + 75%|███████▌ | 5539/7378 [19:00:04<6:14:34, 12.22s/it] + +{'loss': 0.4723, 'learning_rate': 3.0858385088974696e-06, 'epoch': 0.75} + + 75%|███████▌ | 5539/7378 [19:00:04<6:14:34, 12.22s/it] + 75%|███████▌ | 5540/7378 [19:00:17<6:15:49, 12.27s/it] + +{'loss': 0.4824, 'learning_rate': 3.08266748061271e-06, 'epoch': 0.75} + + 75%|███████▌ | 5540/7378 [19:00:17<6:15:49, 12.27s/it] + 75%|███████▌ | 5541/7378 [19:00:29<6:16:32, 12.30s/it] + +{'loss': 0.4103, 'learning_rate': 3.0794977855353835e-06, 'epoch': 0.75} + + 75%|███████▌ | 5541/7378 [19:00:29<6:16:32, 12.30s/it] + 75%|███████▌ | 5542/7378 [19:00:41<6:12:20, 12.17s/it] + +{'loss': 0.4518, 'learning_rate': 3.0763294242764064e-06, 'epoch': 0.75} + + 75%|███████▌ | 5542/7378 [19:00:41<6:12:20, 12.17s/it] + 75%|███████▌ | 5543/7378 [19:00:54<6:18:47, 12.39s/it] + +{'loss': 0.4556, 'learning_rate': 3.0731623974464265e-06, 'epoch': 0.75} + + 75%|███████▌ | 5543/7378 [19:00:54<6:18:47, 12.39s/it] + 75%|███████▌ | 5544/7378 [19:01:06<6:15:57, 12.30s/it] + +{'loss': 0.4297, 'learning_rate': 3.06999670565584e-06, 'epoch': 0.75} + + 75%|███████▌ | 5544/7378 [19:01:06<6:15:57, 12.30s/it] + 75%|███████▌ | 5545/7378 [19:01:19<6:20:31, 12.46s/it] + +{'loss': 0.4261, 'learning_rate': 3.06683234951478e-06, 'epoch': 0.75} + + 75%|███████▌ | 5545/7378 [19:01:19<6:20:31, 12.46s/it] + 75%|███████▌ | 5546/7378 [19:01:31<6:19:37, 12.43s/it] + +{'loss': 0.4148, 'learning_rate': 3.0636693296331334e-06, 'epoch': 0.75} + + 75%|███████▌ | 5546/7378 [19:01:31<6:19:37, 12.43s/it] + 75%|███████▌ | 5547/7378 [19:01:44<6:20:32, 12.47s/it] + +{'loss': 0.4173, 'learning_rate': 3.0605076466205196e-06, 'epoch': 0.75} + + 75%|███████▌ | 5547/7378 [19:01:44<6:20:32, 12.47s/it] + 75%|███████▌ | 5548/7378 [19:01:56<6:17:33, 12.38s/it] + +{'loss': 0.4458, 'learning_rate': 3.0573473010863032e-06, 'epoch': 0.75} + + 75%|███████▌ | 5548/7378 [19:01:56<6:17:33, 12.38s/it] + 75%|███████▌ | 5549/7378 [19:02:08<6:19:03, 12.43s/it] + +{'loss': 0.4279, 'learning_rate': 3.0541882936395917e-06, 'epoch': 0.75} + + 75%|███████▌ | 5549/7378 [19:02:08<6:19:03, 12.43s/it] + 75%|███████▌ | 5550/7378 [19:02:21<6:19:06, 12.44s/it] + +{'loss': 0.4302, 'learning_rate': 3.0510306248892307e-06, 'epoch': 0.75} + + 75%|███████▌ | 5550/7378 [19:02:21<6:19:06, 12.44s/it] + 75%|███████▌ | 5551/7378 [19:02:33<6:19:12, 12.45s/it] + +{'loss': 0.4572, 'learning_rate': 3.0478742954438166e-06, 'epoch': 0.75} + + 75%|███████▌ | 5551/7378 [19:02:33<6:19:12, 12.45s/it] + 75%|███████▌ | 5552/7378 [19:02:46<6:17:44, 12.41s/it] + +{'loss': 0.4829, 'learning_rate': 3.0447193059116818e-06, 'epoch': 0.75} + + 75%|███████▌ | 5552/7378 [19:02:46<6:17:44, 12.41s/it] + 75%|███████▌ | 5553/7378 [19:02:58<6:15:48, 12.36s/it] + +{'loss': 0.4312, 'learning_rate': 3.0415656569009e-06, 'epoch': 0.75} + + 75%|███████▌ | 5553/7378 [19:02:58<6:15:48, 12.36s/it] + 75%|███████▌ | 5554/7378 [19:03:10<6:15:01, 12.34s/it] + +{'loss': 0.3904, 'learning_rate': 3.0384133490192836e-06, 'epoch': 0.75} + + 75%|███████▌ | 5554/7378 [19:03:10<6:15:01, 12.34s/it] + 75%|███████▌ | 5555/7378 [19:03:23<6:15:25, 12.36s/it] + +{'loss': 0.4564, 'learning_rate': 3.0352623828743977e-06, 'epoch': 0.75} + + 75%|███████▌ | 5555/7378 [19:03:23<6:15:25, 12.36s/it] + 75%|███████▌ | 5556/7378 [19:03:35<6:18:21, 12.46s/it] + +{'loss': 0.4386, 'learning_rate': 3.0321127590735377e-06, 'epoch': 0.75} + + 75%|███████▌ | 5556/7378 [19:03:35<6:18:21, 12.46s/it] + 75%|███████▌ | 5557/7378 [19:03:47<6:15:10, 12.36s/it] + +{'loss': 0.4591, 'learning_rate': 3.028964478223745e-06, 'epoch': 0.75} + + 75%|███████▌ | 5557/7378 [19:03:47<6:15:10, 12.36s/it] + 75%|███████▌ | 5558/7378 [19:04:00<6:21:23, 12.57s/it] + +{'loss': 0.4311, 'learning_rate': 3.0258175409318015e-06, 'epoch': 0.75} + + 75%|███████▌ | 5558/7378 [19:04:00<6:21:23, 12.57s/it] + 75%|███████▌ | 5559/7378 [19:04:13<6:19:20, 12.51s/it] + +{'loss': 0.4537, 'learning_rate': 3.0226719478042267e-06, 'epoch': 0.75} + + 75%|███████▌ | 5559/7378 [19:04:13<6:19:20, 12.51s/it] + 75%|███████▌ | 5560/7378 [19:04:25<6:14:55, 12.37s/it] + +{'loss': 0.4927, 'learning_rate': 3.01952769944729e-06, 'epoch': 0.75} + + 75%|███████▌ | 5560/7378 [19:04:25<6:14:55, 12.37s/it] + 75%|███████▌ | 5561/7378 [19:04:37<6:14:07, 12.35s/it] + +{'loss': 0.3987, 'learning_rate': 3.0163847964669933e-06, 'epoch': 0.75} + + 75%|███████▌ | 5561/7378 [19:04:37<6:14:07, 12.35s/it] + 75%|███████▌ | 5562/7378 [19:04:50<6:16:49, 12.45s/it] + +{'loss': 0.4279, 'learning_rate': 3.0132432394690827e-06, 'epoch': 0.75} + + 75%|███████▌ | 5562/7378 [19:04:50<6:16:49, 12.45s/it] + 75%|███████▌ | 5563/7378 [19:05:02<6:15:05, 12.40s/it] + +{'loss': 0.4487, 'learning_rate': 3.010103029059043e-06, 'epoch': 0.75} + + 75%|███████▌ | 5563/7378 [19:05:02<6:15:05, 12.40s/it] + 75%|███████▌ | 5564/7378 [19:05:15<6:16:07, 12.44s/it] + +{'loss': 0.4258, 'learning_rate': 3.0069641658420965e-06, 'epoch': 0.75} + + 75%|███████▌ | 5564/7378 [19:05:15<6:16:07, 12.44s/it] + 75%|███████▌ | 5565/7378 [19:05:27<6:12:38, 12.33s/it] + +{'loss': 0.4142, 'learning_rate': 3.0038266504232194e-06, 'epoch': 0.75} + + 75%|███████▌ | 5565/7378 [19:05:27<6:12:38, 12.33s/it] + 75%|███████▌ | 5566/7378 [19:05:39<6:10:56, 12.28s/it] + +{'loss': 0.3458, 'learning_rate': 3.0006904834071126e-06, 'epoch': 0.75} + + 75%|███████▌ | 5566/7378 [19:05:39<6:10:56, 12.28s/it] + 75%|███████▌ | 5567/7378 [19:05:51<6:08:43, 12.22s/it] + +{'loss': 0.4783, 'learning_rate': 2.9975556653982252e-06, 'epoch': 0.75} + + 75%|███████▌ | 5567/7378 [19:05:51<6:08:43, 12.22s/it] + 75%|███████▌ | 5568/7378 [19:06:03<6:08:30, 12.22s/it] + +{'loss': 0.4359, 'learning_rate': 2.9944221970007382e-06, 'epoch': 0.75} + + 75%|███████▌ | 5568/7378 [19:06:03<6:08:30, 12.22s/it] + 75%|███████▌ | 5569/7378 [19:06:15<6:07:07, 12.18s/it] + +{'loss': 0.4775, 'learning_rate': 2.991290078818585e-06, 'epoch': 0.75} + + 75%|███████▌ | 5569/7378 [19:06:15<6:07:07, 12.18s/it] + 75%|███████▌ | 5570/7378 [19:06:28<6:11:12, 12.32s/it] + +{'loss': 0.4574, 'learning_rate': 2.988159311455433e-06, 'epoch': 0.75} + + 75%|███████▌ | 5570/7378 [19:06:28<6:11:12, 12.32s/it] + 76%|███████▌ | 5571/7378 [19:06:40<6:12:44, 12.38s/it] + +{'loss': 0.4278, 'learning_rate': 2.985029895514686e-06, 'epoch': 0.76} + + 76%|███████▌ | 5571/7378 [19:06:40<6:12:44, 12.38s/it] + 76%|███████▌ | 5572/7378 [19:06:53<6:15:19, 12.47s/it] + +{'loss': 0.4008, 'learning_rate': 2.9819018315994907e-06, 'epoch': 0.76} + + 76%|███████�� | 5572/7378 [19:06:53<6:15:19, 12.47s/it] + 76%|███████▌ | 5573/7378 [19:07:05<6:14:02, 12.43s/it] + +{'loss': 0.4455, 'learning_rate': 2.9787751203127323e-06, 'epoch': 0.76} + + 76%|███████▌ | 5573/7378 [19:07:06<6:14:02, 12.43s/it] + 76%|███████▌ | 5574/7378 [19:07:18<6:18:41, 12.59s/it] + +{'loss': 0.3998, 'learning_rate': 2.975649762257031e-06, 'epoch': 0.76} + + 76%|███████▌ | 5574/7378 [19:07:18<6:18:41, 12.59s/it] + 76%|███████▌ | 5575/7378 [19:07:31<6:15:54, 12.51s/it] + +{'loss': 0.4239, 'learning_rate': 2.972525758034759e-06, 'epoch': 0.76} + + 76%|███████▌ | 5575/7378 [19:07:31<6:15:54, 12.51s/it] + 76%|███████▌ | 5576/7378 [19:07:43<6:12:38, 12.41s/it] + +{'loss': 0.4148, 'learning_rate': 2.9694031082480135e-06, 'epoch': 0.76} + + 76%|███████▌ | 5576/7378 [19:07:43<6:12:38, 12.41s/it] + 76%|███████▌ | 5577/7378 [19:07:55<6:11:33, 12.38s/it] + +{'loss': 0.4382, 'learning_rate': 2.966281813498637e-06, 'epoch': 0.76} + + 76%|███████▌ | 5577/7378 [19:07:55<6:11:33, 12.38s/it] + 76%|███████▌ | 5578/7378 [19:08:08<6:15:01, 12.50s/it] + +{'loss': 0.4488, 'learning_rate': 2.9631618743882086e-06, 'epoch': 0.76} + + 76%|███████▌ | 5578/7378 [19:08:08<6:15:01, 12.50s/it] + 76%|███████▌ | 5579/7378 [19:08:20<6:13:06, 12.44s/it] + +{'loss': 0.419, 'learning_rate': 2.960043291518052e-06, 'epoch': 0.76} + + 76%|███████▌ | 5579/7378 [19:08:20<6:13:06, 12.44s/it] + 76%|███████▌ | 5580/7378 [19:08:33<6:10:56, 12.38s/it] + +{'loss': 0.4371, 'learning_rate': 2.956926065489224e-06, 'epoch': 0.76} + + 76%|███████▌ | 5580/7378 [19:08:33<6:10:56, 12.38s/it] + 76%|███████▌ | 5581/7378 [19:08:45<6:13:04, 12.46s/it] + +{'loss': 0.4712, 'learning_rate': 2.95381019690252e-06, 'epoch': 0.76} + + 76%|███████▌ | 5581/7378 [19:08:45<6:13:04, 12.46s/it] + 76%|███████▌ | 5582/7378 [19:08:58<6:12:25, 12.44s/it] + +{'loss': 0.439, 'learning_rate': 2.9506956863584734e-06, 'epoch': 0.76} + + 76%|███████▌ | 5582/7378 [19:08:58<6:12:25, 12.44s/it] + 76%|███████▌ | 5583/7378 [19:09:10<6:09:41, 12.36s/it] + +{'loss': 0.4626, 'learning_rate': 2.947582534457357e-06, 'epoch': 0.76} + + 76%|███████▌ | 5583/7378 [19:09:10<6:09:41, 12.36s/it] + 76%|███████▌ | 5584/7378 [19:09:22<6:08:54, 12.34s/it] + +{'loss': 0.4589, 'learning_rate': 2.9444707417991857e-06, 'epoch': 0.76} + + 76%|███████▌ | 5584/7378 [19:09:22<6:08:54, 12.34s/it] + 76%|███████▌ | 5585/7378 [19:09:34<6:04:27, 12.20s/it] + +{'loss': 0.3937, 'learning_rate': 2.9413603089837084e-06, 'epoch': 0.76} + + 76%|███████▌ | 5585/7378 [19:09:34<6:04:27, 12.20s/it] + 76%|███████▌ | 5586/7378 [19:09:46<6:04:49, 12.21s/it] + +{'loss': 0.4537, 'learning_rate': 2.938251236610409e-06, 'epoch': 0.76} + + 76%|███████▌ | 5586/7378 [19:09:46<6:04:49, 12.21s/it] + 76%|███████▌ | 5587/7378 [19:09:58<6:03:41, 12.18s/it] + +{'loss': 0.4481, 'learning_rate': 2.935143525278512e-06, 'epoch': 0.76} + + 76%|███████▌ | 5587/7378 [19:09:58<6:03:41, 12.18s/it] + 76%|███████▌ | 5588/7378 [19:10:11<6:05:51, 12.26s/it] + +{'loss': 0.4755, 'learning_rate': 2.932037175586985e-06, 'epoch': 0.76} + + 76%|███████▌ | 5588/7378 [19:10:11<6:05:51, 12.26s/it] + 76%|███████▌ | 5589/7378 [19:10:23<6:09:14, 12.38s/it] + +{'loss': 0.4607, 'learning_rate': 2.9289321881345257e-06, 'epoch': 0.76} + + 76%|███████▌ | 5589/7378 [19:10:23<6:09:14, 12.38s/it] + 76%|███████▌ | 5590/7378 [19:10:36<6:07:01, 12.32s/it] + +{'loss': 0.4174, 'learning_rate': 2.9258285635195717e-06, 'epoch': 0.76} + + 76%|███████▌ | 5590/7378 [19:10:36<6:07:01, 12.32s/it] + 76%|███████▌ | 5591/7378 [19:10:48<6:03:42, 12.21s/it] + +{'loss': 0.4865, 'learning_rate': 2.9227263023402975e-06, 'epoch': 0.76} + + 76%|███████▌ | 5591/7378 [19:10:48<6:03:42, 12.21s/it] + 76%|███████▌ | 5592/7378 [19:11:00<6:04:15, 12.24s/it] + +{'loss': 0.4393, 'learning_rate': 2.9196254051946127e-06, 'epoch': 0.76} + + 76%|███████▌ | 5592/7378 [19:11:00<6:04:15, 12.24s/it] + 76%|███████▌ | 5593/7378 [19:11:12<6:05:40, 12.29s/it] + +{'loss': 0.4422, 'learning_rate': 2.9165258726801715e-06, 'epoch': 0.76} + + 76%|███████▌ | 5593/7378 [19:11:12<6:05:40, 12.29s/it] + 76%|███████▌ | 5594/7378 [19:11:24<6:03:42, 12.23s/it] + +{'loss': 0.4778, 'learning_rate': 2.9134277053943594e-06, 'epoch': 0.76} + + 76%|███████▌ | 5594/7378 [19:11:24<6:03:42, 12.23s/it] + 76%|███████▌ | 5595/7378 [19:11:37<6:04:45, 12.27s/it] + +{'loss': 0.4454, 'learning_rate': 2.910330903934299e-06, 'epoch': 0.76} + + 76%|███████▌ | 5595/7378 [19:11:37<6:04:45, 12.27s/it] + 76%|███████▌ | 5596/7378 [19:11:49<6:07:03, 12.36s/it] + +{'loss': 0.4162, 'learning_rate': 2.9072354688968463e-06, 'epoch': 0.76} + + 76%|███████▌ | 5596/7378 [19:11:49<6:07:03, 12.36s/it] + 76%|███████▌ | 5597/7378 [19:12:01<6:04:10, 12.27s/it] + +{'loss': 0.4954, 'learning_rate': 2.904141400878604e-06, 'epoch': 0.76} + + 76%|███████▌ | 5597/7378 [19:12:01<6:04:10, 12.27s/it] + 76%|███████▌ | 5598/7378 [19:12:14<6:04:41, 12.29s/it] + +{'loss': 0.4801, 'learning_rate': 2.9010487004759024e-06, 'epoch': 0.76} + + 76%|███████▌ | 5598/7378 [19:12:14<6:04:41, 12.29s/it] + 76%|███████▌ | 5599/7378 [19:12:26<6:06:58, 12.38s/it] + +{'loss': 0.3863, 'learning_rate': 2.897957368284812e-06, 'epoch': 0.76} + + 76%|███████▌ | 5599/7378 [19:12:26<6:06:58, 12.38s/it] + 76%|███████▌ | 5600/7378 [19:12:39<6:07:19, 12.40s/it] + +{'loss': 0.3888, 'learning_rate': 2.894867404901137e-06, 'epoch': 0.76} + + 76%|███████▌ | 5600/7378 [19:12:39<6:07:19, 12.40s/it] + 76%|███████▌ | 5601/7378 [19:12:52<6:11:14, 12.54s/it] + +{'loss': 0.4544, 'learning_rate': 2.891778810920417e-06, 'epoch': 0.76} + + 76%|███████▌ | 5601/7378 [19:12:52<6:11:14, 12.54s/it] + 76%|███████▌ | 5602/7378 [19:13:03<6:04:02, 12.30s/it] + +{'loss': 0.4157, 'learning_rate': 2.888691586937937e-06, 'epoch': 0.76} + + 76%|███████▌ | 5602/7378 [19:13:03<6:04:02, 12.30s/it] + 76%|███████▌ | 5603/7378 [19:13:16<6:03:21, 12.28s/it] + +{'loss': 0.4626, 'learning_rate': 2.8856057335487074e-06, 'epoch': 0.76} + + 76%|███████▌ | 5603/7378 [19:13:16<6:03:21, 12.28s/it] + 76%|███████▌ | 5604/7378 [19:13:28<6:01:24, 12.22s/it] + +{'loss': 0.4168, 'learning_rate': 2.8825212513474775e-06, 'epoch': 0.76} + + 76%|███████▌ | 5604/7378 [19:13:28<6:01:24, 12.22s/it] + 76%|███████▌ | 5605/7378 [19:13:40<6:00:38, 12.20s/it] + +{'loss': 0.4065, 'learning_rate': 2.8794381409287307e-06, 'epoch': 0.76} + + 76%|███████▌ | 5605/7378 [19:13:40<6:00:38, 12.20s/it] + 76%|███████▌ | 5606/7378 [19:13:52<5:58:15, 12.13s/it] + +{'loss': 0.4728, 'learning_rate': 2.876356402886694e-06, 'epoch': 0.76} + + 76%|███████▌ | 5606/7378 [19:13:52<5:58:15, 12.13s/it] + 76%|███████▌ | 5607/7378 [19:14:04<5:58:59, 12.16s/it] + +{'loss': 0.4571, 'learning_rate': 2.873276037815321e-06, 'epoch': 0.76} + + 76%|███████▌ | 5607/7378 [19:14:04<5:58:59, 12.16s/it] + 76%|███████▌ | 5608/7378 [19:14:17<6:03:33, 12.32s/it] + +{'loss': 0.4795, 'learning_rate': 2.870197046308304e-06, 'epoch': 0.76} + + 76%|███████▌ | 5608/7378 [19:14:17<6:03:33, 12.32s/it] + 76%|███████▌ | 5609/7378 [19:14:29<5:59:41, 12.20s/it] + +{'loss': 0.4531, 'learning_rate': 2.867119428959071e-06, 'epoch': 0.76} + + 76%|███████▌ | 5609/7378 [19:14:29<5:59:41, 12.20s/it] + 76%|███████▌ | 5610/7378 [19:14:41<6:03:14, 12.33s/it] + +{'loss': 0.4447, 'learning_rate': 2.86404318636078e-06, 'epoch': 0.76} + + 76%|███████▌ | 5610/7378 [19:14:41<6:03:14, 12.33s/it] + 76%|███████▌ | 5611/7378 [19:14:55<6:16:08, 12.77s/it] + +{'loss': 0.4295, 'learning_rate': 2.860968319106332e-06, 'epoch': 0.76} + + 76%|███████▌ | 5611/7378 [19:14:55<6:16:08, 12.77s/it] + 76%|███████▌ | 5612/7378 [19:15:07<6:12:37, 12.66s/it] + +{'loss': 0.4396, 'learning_rate': 2.857894827788362e-06, 'epoch': 0.76} + + 76%|███████▌ | 5612/7378 [19:15:07<6:12:37, 12.66s/it] + 76%|███████▌ | 5613/7378 [19:15:20<6:09:33, 12.56s/it] + +{'loss': 0.3978, 'learning_rate': 2.8548227129992367e-06, 'epoch': 0.76} + + 76%|███████▌ | 5613/7378 [19:15:20<6:09:33, 12.56s/it] + 76%|███████▌ | 5614/7378 [19:15:32<6:07:36, 12.50s/it] + +{'loss': 0.3975, 'learning_rate': 2.8517519753310564e-06, 'epoch': 0.76} + + 76%|███████▌ | 5614/7378 [19:15:32<6:07:36, 12.50s/it] + 76%|███████▌ | 5615/7378 [19:15:45<6:06:33, 12.47s/it] + +{'loss': 0.416, 'learning_rate': 2.848682615375653e-06, 'epoch': 0.76} + + 76%|███████▌ | 5615/7378 [19:15:45<6:06:33, 12.47s/it] + 76%|███████▌ | 5616/7378 [19:15:57<6:06:14, 12.47s/it] + +{'loss': 0.4335, 'learning_rate': 2.845614633724607e-06, 'epoch': 0.76} + + 76%|███████▌ | 5616/7378 [19:15:57<6:06:14, 12.47s/it] + 76%|███████▌ | 5617/7378 [19:16:09<6:02:30, 12.35s/it] + +{'loss': 0.477, 'learning_rate': 2.8425480309692177e-06, 'epoch': 0.76} + + 76%|███████▌ | 5617/7378 [19:16:09<6:02:30, 12.35s/it] + 76%|███████▌ | 5618/7378 [19:16:22<6:03:30, 12.39s/it] + +{'loss': 0.3773, 'learning_rate': 2.8394828077005277e-06, 'epoch': 0.76} + + 76%|███████▌ | 5618/7378 [19:16:22<6:03:30, 12.39s/it] + 76%|███████▌ | 5619/7378 [19:16:34<6:00:34, 12.30s/it] + +{'loss': 0.4303, 'learning_rate': 2.8364189645093076e-06, 'epoch': 0.76} + + 76%|███████▌ | 5619/7378 [19:16:34<6:00:34, 12.30s/it] + 76%|███████▌ | 5620/7378 [19:16:46<5:57:22, 12.20s/it] + +{'loss': 0.4115, 'learning_rate': 2.8333565019860644e-06, 'epoch': 0.76} + + 76%|███████▌ | 5620/7378 [19:16:46<5:57:22, 12.20s/it] + 76%|███████▌ | 5621/7378 [19:16:58<5:54:48, 12.12s/it] + +{'loss': 0.4448, 'learning_rate': 2.830295420721044e-06, 'epoch': 0.76} + + 76%|███████▌ | 5621/7378 [19:16:58<5:54:48, 12.12s/it] + 76%|███████▌ | 5622/7378 [19:17:10<5:54:30, 12.11s/it] + +{'loss': 0.3475, 'learning_rate': 2.82723572130422e-06, 'epoch': 0.76} + + 76%|███████▌ | 5622/7378 [19:17:10<5:54:30, 12.11s/it] + 76%|███████▌ | 5623/7378 [19:17:22<5:52:49, 12.06s/it] + +{'loss': 0.4111, 'learning_rate': 2.8241774043253023e-06, 'epoch': 0.76} + + 76%|███████▌ | 5623/7378 [19:17:22<5:52:49, 12.06s/it] + 76%|███████▌ | 5624/7378 [19:17:34<5:58:02, 12.25s/it] + +{'loss': 0.4863, 'learning_rate': 2.821120470373733e-06, 'epoch': 0.76} + + 76%|███████▌ | 5624/7378 [19:17:34<5:58:02, 12.25s/it] + 76%|███████▌ | 5625/7378 [19:17:46<5:55:24, 12.16s/it] + +{'loss': 0.4836, 'learning_rate': 2.8180649200386835e-06, 'epoch': 0.76} + + 76%|███████▌ | 5625/7378 [19:17:46<5:55:24, 12.16s/it] + 76%|███████▋ | 5626/7378 [19:17:59<5:58:27, 12.28s/it] + +{'loss': 0.414, 'learning_rate': 2.815010753909071e-06, 'epoch': 0.76} + + 76%|███████▋ | 5626/7378 [19:17:59<5:58:27, 12.28s/it] + 76%|███████▋ | 5627/7378 [19:18:11<5:56:46, 12.23s/it] + +{'loss': 0.4018, 'learning_rate': 2.811957972573535e-06, 'epoch': 0.76} + + 76%|███████▋ | 5627/7378 [19:18:11<5:56:46, 12.23s/it] + 76%|███████▋ | 5628/7378 [19:18:23<5:54:26, 12.15s/it] + +{'loss': 0.4356, 'learning_rate': 2.8089065766204504e-06, 'epoch': 0.76} + + 76%|███████▋ | 5628/7378 [19:18:23<5:54:26, 12.15s/it] + 76%|███████▋ | 5629/7378 [19:18:35<5:57:15, 12.26s/it] + +{'loss': 0.4372, 'learning_rate': 2.8058565666379233e-06, 'epoch': 0.76} + + 76%|███████▋ | 5629/7378 [19:18:35<5:57:15, 12.26s/it] + 76%|███████▋ | 5630/7378 [19:18:48<5:57:12, 12.26s/it] + +{'loss': 0.411, 'learning_rate': 2.8028079432138023e-06, 'epoch': 0.76} + + 76%|███████▋ | 5630/7378 [19:18:48<5:57:12, 12.26s/it] + 76%|███████▋ | 5631/7378 [19:19:00<5:58:48, 12.32s/it] + +{'loss': 0.4732, 'learning_rate': 2.799760706935658e-06, 'epoch': 0.76} + + 76%|███████▋ | 5631/7378 [19:19:00<5:58:48, 12.32s/it] + 76%|███████▋ | 5632/7378 [19:19:12<5:55:40, 12.22s/it] + +{'loss': 0.4147, 'learning_rate': 2.796714858390798e-06, 'epoch': 0.76} + + 76%|███████▋ | 5632/7378 [19:19:12<5:55:40, 12.22s/it] + 76%|███████▋ | 5633/7378 [19:19:25<5:58:34, 12.33s/it] + +{'loss': 0.4336, 'learning_rate': 2.7936703981662595e-06, 'epoch': 0.76} + + 76%|███████▋ | 5633/7378 [19:19:25<5:58:34, 12.33s/it] + 76%|███████▋ | 5634/7378 [19:19:37<5:59:47, 12.38s/it] + +{'loss': 0.4564, 'learning_rate': 2.790627326848815e-06, 'epoch': 0.76} + + 76%|███████▋ | 5634/7378 [19:19:37<5:59:47, 12.38s/it] + 76%|███████▋ | 5635/7378 [19:19:49<5:59:12, 12.37s/it] + +{'loss': 0.4479, 'learning_rate': 2.7875856450249728e-06, 'epoch': 0.76} + + 76%|███████▋ | 5635/7378 [19:19:49<5:59:12, 12.37s/it] + 76%|███████▋ | 5636/7378 [19:20:02<5:58:39, 12.35s/it] + +{'loss': 0.455, 'learning_rate': 2.784545353280966e-06, 'epoch': 0.76} + + 76%|███████▋ | 5636/7378 [19:20:02<5:58:39, 12.35s/it] + 76%|███████▋ | 5637/7378 [19:20:14<5:54:39, 12.22s/it] + +{'loss': 0.4265, 'learning_rate': 2.7815064522027645e-06, 'epoch': 0.76} + + 76%|███████▋ | 5637/7378 [19:20:14<5:54:39, 12.22s/it] + 76%|███████▋ | 5638/7378 [19:20:26<5:53:44, 12.20s/it] + +{'loss': 0.3768, 'learning_rate': 2.7784689423760656e-06, 'epoch': 0.76} + + 76%|███████▋ | 5638/7378 [19:20:26<5:53:44, 12.20s/it] + 76%|███████▋ | 5639/7378 [19:20:38<5:52:46, 12.17s/it] + +{'loss': 0.4351, 'learning_rate': 2.775432824386307e-06, 'epoch': 0.76} + + 76%|███████▋ | 5639/7378 [19:20:38<5:52:46, 12.17s/it] + 76%|███████▋ | 5640/7378 [19:20:50<5:52:10, 12.16s/it] + +{'loss': 0.4408, 'learning_rate': 2.7723980988186514e-06, 'epoch': 0.76} + + 76%|███████▋ | 5640/7378 [19:20:50<5:52:10, 12.16s/it] + 76%|███████▋ | 5641/7378 [19:21:02<5:51:08, 12.13s/it] + +{'loss': 0.4594, 'learning_rate': 2.7693647662579927e-06, 'epoch': 0.76} + + 76%|███████▋ | 5641/7378 [19:21:02<5:51:08, 12.13s/it] + 76%|███████▋ | 5642/7378 [19:21:15<5:54:46, 12.26s/it] + +{'loss': 0.4607, 'learning_rate': 2.7663328272889588e-06, 'epoch': 0.76} + + 76%|███████▋ | 5642/7378 [19:21:15<5:54:46, 12.26s/it] + 76%|███████▋ | 5643/7378 [19:21:27<5:54:53, 12.27s/it] + +{'loss': 0.4743, 'learning_rate': 2.7633022824959055e-06, 'epoch': 0.76} + + 76%|███████▋ | 5643/7378 [19:21:27<5:54:53, 12.27s/it] + 76%|███████▋ | 5644/7378 [19:21:39<5:55:19, 12.29s/it] + +{'loss': 0.338, 'learning_rate': 2.7602731324629294e-06, 'epoch': 0.76} + + 76%|███████▋ | 5644/7378 [19:21:39<5:55:19, 12.29s/it] + 77%|███████▋ | 5645/7378 [19:21:52<5:56:10, 12.33s/it] + +{'loss': 0.4457, 'learning_rate': 2.7572453777738474e-06, 'epoch': 0.77} + + 77%|███████▋ | 5645/7378 [19:21:52<5:56:10, 12.33s/it] + 77%|███████▋ | 5646/7378 [19:22:04<5:51:39, 12.18s/it] + +{'loss': 0.4096, 'learning_rate': 2.7542190190122133e-06, 'epoch': 0.77} + + 77%|███████▋ | 5646/7378 [19:22:04<5:51:39, 12.18s/it] + 77%|███████▋ | 5647/7378 [19:22:16<5:52:40, 12.22s/it] + +{'loss': 0.4306, 'learning_rate': 2.751194056761306e-06, 'epoch': 0.77} + + 77%|███████▋ | 5647/7378 [19:22:16<5:52:40, 12.22s/it] + 77%|███████▋ | 5648/7378 [19:22:28<5:50:38, 12.16s/it] + +{'loss': 0.4272, 'learning_rate': 2.7481704916041475e-06, 'epoch': 0.77} + + 77%|███████▋ | 5648/7378 [19:22:28<5:50:38, 12.16s/it] + 77%|███████▋ | 5649/7378 [19:22:40<5:49:17, 12.12s/it] + +{'loss': 0.4098, 'learning_rate': 2.745148324123477e-06, 'epoch': 0.77} + + 77%|███████▋ | 5649/7378 [19:22:40<5:49:17, 12.12s/it] + 77%|███████▋ | 5650/7378 [19:22:53<5:53:08, 12.26s/it] + +{'loss': 0.4785, 'learning_rate': 2.7421275549017722e-06, 'epoch': 0.77} + + 77%|███████▋ | 5650/7378 [19:22:53<5:53:08, 12.26s/it] + 77%|███████▋ | 5651/7378 [19:23:05<5:57:06, 12.41s/it] + +{'loss': 0.429, 'learning_rate': 2.7391081845212376e-06, 'epoch': 0.77} + + 77%|███████▋ | 5651/7378 [19:23:05<5:57:06, 12.41s/it] + 77%|███████▋ | 5652/7378 [19:23:18<5:56:18, 12.39s/it] + +{'loss': 0.485, 'learning_rate': 2.7360902135638066e-06, 'epoch': 0.77} + + 77%|███████▋ | 5652/7378 [19:23:18<5:56:18, 12.39s/it] + 77%|███████▋ | 5653/7378 [19:23:30<5:54:01, 12.31s/it] + +{'loss': 0.4565, 'learning_rate': 2.7330736426111525e-06, 'epoch': 0.77} + + 77%|███████▋ | 5653/7378 [19:23:30<5:54:01, 12.31s/it] + 77%|███████▋ | 5654/7378 [19:23:43<5:58:18, 12.47s/it] + +{'loss': 0.4034, 'learning_rate': 2.7300584722446676e-06, 'epoch': 0.77} + + 77%|███████▋ | 5654/7378 [19:23:43<5:58:18, 12.47s/it] + 77%|███████▋ | 5655/7378 [19:23:55<5:57:11, 12.44s/it] + +{'loss': 0.4419, 'learning_rate': 2.7270447030454784e-06, 'epoch': 0.77} + + 77%|███████▋ | 5655/7378 [19:23:55<5:57:11, 12.44s/it] + 77%|███████▋ | 5656/7378 [19:24:07<5:53:56, 12.33s/it] + +{'loss': 0.4106, 'learning_rate': 2.7240323355944454e-06, 'epoch': 0.77} + + 77%|███████▋ | 5656/7378 [19:24:07<5:53:56, 12.33s/it] + 77%|███████▋ | 5657/7378 [19:24:20<5:54:45, 12.37s/it] + +{'loss': 0.4621, 'learning_rate': 2.72102137047215e-06, 'epoch': 0.77} + + 77%|███████▋ | 5657/7378 [19:24:20<5:54:45, 12.37s/it] + 77%|███████▋ | 5658/7378 [19:24:32<5:59:10, 12.53s/it] + +{'loss': 0.4606, 'learning_rate': 2.718011808258915e-06, 'epoch': 0.77} + + 77%|███████▋ | 5658/7378 [19:24:32<5:59:10, 12.53s/it] + 77%|███████▋ | 5659/7378 [19:24:45<5:55:27, 12.41s/it] + +{'loss': 0.5125, 'learning_rate': 2.715003649534783e-06, 'epoch': 0.77} + + 77%|███████▋ | 5659/7378 [19:24:45<5:55:27, 12.41s/it] + 77%|███████▋ | 5660/7378 [19:24:57<5:57:29, 12.48s/it] + +{'loss': 0.4391, 'learning_rate': 2.7119968948795285e-06, 'epoch': 0.77} + + 77%|███████▋ | 5660/7378 [19:24:57<5:57:29, 12.48s/it] + 77%|███████▋ | 5661/7378 [19:25:10<5:55:42, 12.43s/it] + +{'loss': 0.4269, 'learning_rate': 2.708991544872658e-06, 'epoch': 0.77} + + 77%|███████▋ | 5661/7378 [19:25:10<5:55:42, 12.43s/it] + 77%|███████▋ | 5662/7378 [19:25:22<5:53:40, 12.37s/it] + +{'loss': 0.3902, 'learning_rate': 2.7059876000934006e-06, 'epoch': 0.77} + + 77%|███████▋ | 5662/7378 [19:25:22<5:53:40, 12.37s/it] + 77%|███████▋ | 5663/7378 [19:25:34<5:51:18, 12.29s/it] + +{'loss': 0.4655, 'learning_rate': 2.7029850611207277e-06, 'epoch': 0.77} + + 77%|███████▋ | 5663/7378 [19:25:34<5:51:18, 12.29s/it] + 77%|███████▋ | 5664/7378 [19:25:47<5:55:00, 12.43s/it] + +{'loss': 0.4583, 'learning_rate': 2.6999839285333272e-06, 'epoch': 0.77} + + 77%|███████▋ | 5664/7378 [19:25:47<5:55:00, 12.43s/it] + 77%|███████▋ | 5665/7378 [19:25:59<5:51:11, 12.30s/it] + +{'loss': 0.4065, 'learning_rate': 2.6969842029096217e-06, 'epoch': 0.77} + + 77%|███████▋ | 5665/7378 [19:25:59<5:51:11, 12.30s/it] + 77%|███████▋ | 5666/7378 [19:26:11<5:53:05, 12.37s/it] + +{'loss': 0.4125, 'learning_rate': 2.6939858848277566e-06, 'epoch': 0.77} + + 77%|███████▋ | 5666/7378 [19:26:11<5:53:05, 12.37s/it] + 77%|███████▋ | 5667/7378 [19:26:24<5:52:41, 12.37s/it] + +{'loss': 0.4461, 'learning_rate': 2.690988974865617e-06, 'epoch': 0.77} + + 77%|███████▋ | 5667/7378 [19:26:24<5:52:41, 12.37s/it] + 77%|███████▋ | 5668/7378 [19:26:36<5:53:20, 12.40s/it] + +{'loss': 0.4766, 'learning_rate': 2.6879934736008097e-06, 'epoch': 0.77} + + 77%|███████▋ | 5668/7378 [19:26:36<5:53:20, 12.40s/it] + 77%|███████▋ | 5669/7378 [19:26:48<5:50:56, 12.32s/it] + +{'loss': 0.397, 'learning_rate': 2.684999381610668e-06, 'epoch': 0.77} + + 77%|███████▋ | 5669/7378 [19:26:48<5:50:56, 12.32s/it] + 77%|███████▋ | 5670/7378 [19:27:00<5:46:26, 12.17s/it] + +{'loss': 0.466, 'learning_rate': 2.682006699472256e-06, 'epoch': 0.77} + + 77%|███████▋ | 5670/7378 [19:27:00<5:46:26, 12.17s/it] + 77%|███████▋ | 5671/7378 [19:27:12<5:46:39, 12.19s/it] + +{'loss': 0.4769, 'learning_rate': 2.679015427762366e-06, 'epoch': 0.77} + + 77%|███████▋ | 5671/7378 [19:27:12<5:46:39, 12.19s/it] + 77%|███████▋ | 5672/7378 [19:27:24<5:45:56, 12.17s/it] + +{'loss': 0.4805, 'learning_rate': 2.676025567057522e-06, 'epoch': 0.77} + + 77%|███████▋ | 5672/7378 [19:27:24<5:45:56, 12.17s/it] + 77%|███████▋ | 5673/7378 [19:27:36<5:43:18, 12.08s/it] + +{'loss': 0.4202, 'learning_rate': 2.673037117933971e-06, 'epoch': 0.77} + + 77%|███████▋ | 5673/7378 [19:27:36<5:43:18, 12.08s/it] + 77%|███████▋ | 5674/7378 [19:27:48<5:42:42, 12.07s/it] + +{'loss': 0.4114, 'learning_rate': 2.670050080967689e-06, 'epoch': 0.77} + + 77%|███████▋ | 5674/7378 [19:27:48<5:42:42, 12.07s/it] + 77%|███████▋ | 5675/7378 [19:28:00<5:41:58, 12.05s/it] + +{'loss': 0.4507, 'learning_rate': 2.6670644567343793e-06, 'epoch': 0.77} + + 77%|███████▋ | 5675/7378 [19:28:00<5:41:58, 12.05s/it] + 77%|███████▋ | 5676/7378 [19:28:12<5:41:47, 12.05s/it] + +{'loss': 0.4684, 'learning_rate': 2.6640802458094783e-06, 'epoch': 0.77} + + 77%|███████▋ | 5676/7378 [19:28:12<5:41:47, 12.05s/it] + 77%|███████▋ | 5677/7378 [19:28:24<5:40:42, 12.02s/it] + +{'loss': 0.4168, 'learning_rate': 2.661097448768144e-06, 'epoch': 0.77} + + 77%|███████▋ | 5677/7378 [19:28:24<5:40:42, 12.02s/it] + 77%|███████▋ | 5678/7378 [19:28:36<5:39:43, 11.99s/it] + +{'loss': 0.4672, 'learning_rate': 2.6581160661852635e-06, 'epoch': 0.77} + + 77%|███████▋ | 5678/7378 [19:28:36<5:39:43, 11.99s/it] + 77%|███████▋ | 5679/7378 [19:28:49<5:43:00, 12.11s/it] + +{'loss': 0.4484, 'learning_rate': 2.6551360986354514e-06, 'epoch': 0.77} + + 77%|███████▋ | 5679/7378 [19:28:49<5:43:00, 12.11s/it] + 77%|███████▋ | 5680/7378 [19:29:01<5:45:46, 12.22s/it] + +{'loss': 0.4015, 'learning_rate': 2.652157546693046e-06, 'epoch': 0.77} + + 77%|███████▋ | 5680/7378 [19:29:01<5:45:46, 12.22s/it] + 77%|███████▋ | 5681/7378 [19:29:13<5:42:54, 12.12s/it] + +{'loss': 0.4279, 'learning_rate': 2.649180410932124e-06, 'epoch': 0.77} + + 77%|███████▋ | 5681/7378 [19:29:13<5:42:54, 12.12s/it] + 77%|███████▋ | 5682/7378 [19:29:25<5:43:29, 12.15s/it] + +{'loss': 0.4476, 'learning_rate': 2.6462046919264782e-06, 'epoch': 0.77} + + 77%|███████▋ | 5682/7378 [19:29:25<5:43:29, 12.15s/it] + 77%|███████▋ | 5683/7378 [19:29:38<5:45:25, 12.23s/it] + +{'loss': 0.3718, 'learning_rate': 2.6432303902496315e-06, 'epoch': 0.77} + + 77%|███████▋ | 5683/7378 [19:29:38<5:45:25, 12.23s/it] + 77%|███████▋ | 5684/7378 [19:29:50<5:43:34, 12.17s/it] + +{'loss': 0.3056, 'learning_rate': 2.6402575064748337e-06, 'epoch': 0.77} + + 77%|███████▋ | 5684/7378 [19:29:50<5:43:34, 12.17s/it] + 77%|███████▋ | 5685/7378 [19:30:02<5:45:53, 12.26s/it] + +{'loss': 0.415, 'learning_rate': 2.637286041175059e-06, 'epoch': 0.77} + + 77%|███████▋ | 5685/7378 [19:30:02<5:45:53, 12.26s/it] + 77%|███████▋ | 5686/7378 [19:30:14<5:44:31, 12.22s/it] + +{'loss': 0.4233, 'learning_rate': 2.634315994923017e-06, 'epoch': 0.77} + + 77%|███████▋ | 5686/7378 [19:30:14<5:44:31, 12.22s/it] + 77%|███████▋ | 5687/7378 [19:30:27<5:47:59, 12.35s/it] + +{'loss': 0.4718, 'learning_rate': 2.631347368291134e-06, 'epoch': 0.77} + + 77%|███████▋ | 5687/7378 [19:30:27<5:47:59, 12.35s/it] + 77%|███████▋ | 5688/7378 [19:30:39<5:45:51, 12.28s/it] + +{'loss': 0.4475, 'learning_rate': 2.628380161851567e-06, 'epoch': 0.77} + + 77%|███████▋ | 5688/7378 [19:30:39<5:45:51, 12.28s/it] + 77%|███████▋ | 5689/7378 [19:30:51<5:43:22, 12.20s/it] + +{'loss': 0.4124, 'learning_rate': 2.6254143761761942e-06, 'epoch': 0.77} + + 77%|███████▋ | 5689/7378 [19:30:51<5:43:22, 12.20s/it] + 77%|███████▋ | 5690/7378 [19:31:04<5:46:55, 12.33s/it] + +{'loss': 0.4377, 'learning_rate': 2.6224500118366313e-06, 'epoch': 0.77} + + 77%|███████▋ | 5690/7378 [19:31:04<5:46:55, 12.33s/it] + 77%|███████▋ | 5691/7378 [19:31:16<5:44:20, 12.25s/it] + +{'loss': 0.4521, 'learning_rate': 2.6194870694042097e-06, 'epoch': 0.77} + + 77%|███████▋ | 5691/7378 [19:31:16<5:44:20, 12.25s/it] + 77%|███████▋ | 5692/7378 [19:31:28<5:49:20, 12.43s/it] + +{'loss': 0.4454, 'learning_rate': 2.616525549449991e-06, 'epoch': 0.77} + + 77%|███████▋ | 5692/7378 [19:31:29<5:49:20, 12.43s/it] + 77%|███████▋ | 5693/7378 [19:31:41<5:47:04, 12.36s/it] + +{'loss': 0.4476, 'learning_rate': 2.6135654525447607e-06, 'epoch': 0.77} + + 77%|███████▋ | 5693/7378 [19:31:41<5:47:04, 12.36s/it] + 77%|███████▋ | 5694/7378 [19:31:53<5:46:00, 12.33s/it] + +{'loss': 0.4507, 'learning_rate': 2.6106067792590284e-06, 'epoch': 0.77} + + 77%|███████▋ | 5694/7378 [19:31:53<5:46:00, 12.33s/it] + 77%|███████▋ | 5695/7378 [19:32:05<5:45:28, 12.32s/it] + +{'loss': 0.4495, 'learning_rate': 2.6076495301630387e-06, 'epoch': 0.77} + + 77%|███████▋ | 5695/7378 [19:32:05<5:45:28, 12.32s/it] + 77%|███████▋ | 5696/7378 [19:32:17<5:43:08, 12.24s/it] + +{'loss': 0.5094, 'learning_rate': 2.604693705826751e-06, 'epoch': 0.77} + + 77%|███████▋ | 5696/7378 [19:32:17<5:43:08, 12.24s/it] + 77%|███████▋ | 5697/7378 [19:32:30<5:44:43, 12.30s/it] + +{'loss': 0.4449, 'learning_rate': 2.601739306819854e-06, 'epoch': 0.77} + + 77%|███████▋ | 5697/7378 [19:32:30<5:44:43, 12.30s/it] + 77%|███████▋ | 5698/7378 [19:32:42<5:46:36, 12.38s/it] + +{'loss': 0.4593, 'learning_rate': 2.5987863337117604e-06, 'epoch': 0.77} + + 77%|███████▋ | 5698/7378 [19:32:42<5:46:36, 12.38s/it] + 77%|███████▋ | 5699/7378 [19:32:55<5:45:44, 12.36s/it] + +{'loss': 0.4801, 'learning_rate': 2.5958347870716106e-06, 'epoch': 0.77} + + 77%|███████▋ | 5699/7378 [19:32:55<5:45:44, 12.36s/it] + 77%|███████▋ | 5700/7378 [19:33:07<5:46:48, 12.40s/it] + +{'loss': 0.4756, 'learning_rate': 2.592884667468273e-06, 'epoch': 0.77} + + 77%|███████▋ | 5700/7378 [19:33:07<5:46:48, 12.40s/it] + 77%|███████▋ | 5701/7378 [19:33:19<5:45:42, 12.37s/it] + +{'loss': 0.4355, 'learning_rate': 2.5899359754703334e-06, 'epoch': 0.77} + + 77%|███████▋ | 5701/7378 [19:33:19<5:45:42, 12.37s/it] + 77%|███████▋ | 5702/7378 [19:33:32<5:47:28, 12.44s/it] + +{'loss': 0.4789, 'learning_rate': 2.5869887116461055e-06, 'epoch': 0.77} + + 77%|███████▋ | 5702/7378 [19:33:32<5:47:28, 12.44s/it] + 77%|███████▋ | 5703/7378 [19:33:44<5:44:59, 12.36s/it] + +{'loss': 0.4587, 'learning_rate': 2.5840428765636304e-06, 'epoch': 0.77} + + 77%|███████▋ | 5703/7378 [19:33:44<5:44:59, 12.36s/it] + 77%|███████▋ | 5704/7378 [19:33:56<5:43:21, 12.31s/it] + +{'loss': 0.4483, 'learning_rate': 2.581098470790667e-06, 'epoch': 0.77} + + 77%|███████▋ | 5704/7378 [19:33:56<5:43:21, 12.31s/it] + 77%|███████▋ | 5705/7378 [19:34:09<5:44:16, 12.35s/it] + +{'loss': 0.4789, 'learning_rate': 2.5781554948947097e-06, 'epoch': 0.77} + + 77%|███████▋ | 5705/7378 [19:34:09<5:44:16, 12.35s/it] + 77%|███████��� | 5706/7378 [19:34:21<5:44:23, 12.36s/it] + +{'loss': 0.4546, 'learning_rate': 2.5752139494429673e-06, 'epoch': 0.77} + + 77%|███████▋ | 5706/7378 [19:34:21<5:44:23, 12.36s/it] + 77%|███████▋ | 5707/7378 [19:34:33<5:41:36, 12.27s/it] + +{'loss': 0.4416, 'learning_rate': 2.5722738350023768e-06, 'epoch': 0.77} + + 77%|███████▋ | 5707/7378 [19:34:33<5:41:36, 12.27s/it] + 77%|███████▋ | 5708/7378 [19:34:45<5:40:56, 12.25s/it] + +{'loss': 0.4536, 'learning_rate': 2.569335152139597e-06, 'epoch': 0.77} + + 77%|███████▋ | 5708/7378 [19:34:45<5:40:56, 12.25s/it] + 77%|███████▋ | 5709/7378 [19:34:58<5:40:24, 12.24s/it] + +{'loss': 0.4248, 'learning_rate': 2.5663979014210194e-06, 'epoch': 0.77} + + 77%|███████▋ | 5709/7378 [19:34:58<5:40:24, 12.24s/it] + 77%|███████▋ | 5710/7378 [19:35:10<5:39:22, 12.21s/it] + +{'loss': 0.4911, 'learning_rate': 2.5634620834127476e-06, 'epoch': 0.77} + + 77%|███████▋ | 5710/7378 [19:35:10<5:39:22, 12.21s/it] + 77%|███████▋ | 5711/7378 [19:35:22<5:43:20, 12.36s/it] + +{'loss': 0.4676, 'learning_rate': 2.560527698680617e-06, 'epoch': 0.77} + + 77%|███████▋ | 5711/7378 [19:35:23<5:43:20, 12.36s/it] + 77%|███████▋ | 5712/7378 [19:35:35<5:42:29, 12.33s/it] + +{'loss': 0.4854, 'learning_rate': 2.5575947477901843e-06, 'epoch': 0.77} + + 77%|███████▋ | 5712/7378 [19:35:35<5:42:29, 12.33s/it] + 77%|███████▋ | 5713/7378 [19:35:47<5:39:58, 12.25s/it] + +{'loss': 0.4555, 'learning_rate': 2.554663231306724e-06, 'epoch': 0.77} + + 77%|███████▋ | 5713/7378 [19:35:47<5:39:58, 12.25s/it] + 77%|███████▋ | 5714/7378 [19:35:59<5:39:48, 12.25s/it] + +{'loss': 0.4315, 'learning_rate': 2.551733149795249e-06, 'epoch': 0.77} + + 77%|███████▋ | 5714/7378 [19:35:59<5:39:48, 12.25s/it] + 77%|███████▋ | 5715/7378 [19:36:11<5:40:32, 12.29s/it] + +{'loss': 0.5089, 'learning_rate': 2.5488045038204823e-06, 'epoch': 0.77} + + 77%|███████▋ | 5715/7378 [19:36:11<5:40:32, 12.29s/it] + 77%|███████▋ | 5716/7378 [19:36:24<5:38:59, 12.24s/it] + +{'loss': 0.449, 'learning_rate': 2.5458772939468733e-06, 'epoch': 0.77} + + 77%|███████▋ | 5716/7378 [19:36:24<5:38:59, 12.24s/it] + 77%|███████▋ | 5717/7378 [19:36:35<5:34:40, 12.09s/it] + +{'loss': 0.4474, 'learning_rate': 2.5429515207385957e-06, 'epoch': 0.77} + + 77%|███████▋ | 5717/7378 [19:36:35<5:34:40, 12.09s/it] + 78%|███████▊ | 5718/7378 [19:36:48<5:36:34, 12.17s/it] + +{'loss': 0.4081, 'learning_rate': 2.5400271847595503e-06, 'epoch': 0.78} + + 78%|███████▊ | 5718/7378 [19:36:48<5:36:34, 12.17s/it] + 78%|███████▊ | 5719/7378 [19:37:00<5:38:53, 12.26s/it] + +{'loss': 0.453, 'learning_rate': 2.5371042865733552e-06, 'epoch': 0.78} + + 78%|███████▊ | 5719/7378 [19:37:00<5:38:53, 12.26s/it] + 78%|███████▊ | 5720/7378 [19:37:13<5:42:30, 12.39s/it] + +{'loss': 0.4805, 'learning_rate': 2.5341828267433523e-06, 'epoch': 0.78} + + 78%|███████▊ | 5720/7378 [19:37:13<5:42:30, 12.39s/it] + 78%|███████▊ | 5721/7378 [19:37:25<5:39:15, 12.28s/it] + +{'loss': 0.4255, 'learning_rate': 2.531262805832607e-06, 'epoch': 0.78} + + 78%|███████▊ | 5721/7378 [19:37:25<5:39:15, 12.28s/it] + 78%|███████▊ | 5722/7378 [19:37:37<5:38:15, 12.26s/it] + +{'loss': 0.4755, 'learning_rate': 2.528344224403906e-06, 'epoch': 0.78} + + 78%|███████▊ | 5722/7378 [19:37:37<5:38:15, 12.26s/it] + 78%|███████▊ | 5723/7378 [19:37:50<5:39:55, 12.32s/it] + +{'loss': 0.4056, 'learning_rate': 2.5254270830197635e-06, 'epoch': 0.78} + + 78%|███████▊ | 5723/7378 [19:37:50<5:39:55, 12.32s/it] + 78%|███████▊ | 5724/7378 [19:38:02<5:42:18, 12.42s/it] + +{'loss': 0.4474, 'learning_rate': 2.522511382242413e-06, 'epoch': 0.78} + + 78%|███████▊ | 5724/7378 [19:38:02<5:42:18, 12.42s/it] + 78%|███████▊ | 5725/7378 [19:38:14<5:39:28, 12.32s/it] + +{'loss': 0.4419, 'learning_rate': 2.519597122633809e-06, 'epoch': 0.78} + + 78%|███████▊ | 5725/7378 [19:38:14<5:39:28, 12.32s/it] + 78%|███████▊ | 5726/7378 [19:38:26<5:35:18, 12.18s/it] + +{'loss': 0.4768, 'learning_rate': 2.5166843047556256e-06, 'epoch': 0.78} + + 78%|███████▊ | 5726/7378 [19:38:26<5:35:18, 12.18s/it] + 78%|███████▊ | 5727/7378 [19:38:39<5:38:24, 12.30s/it] + +{'loss': 0.4411, 'learning_rate': 2.513772929169268e-06, 'epoch': 0.78} + + 78%|███████▊ | 5727/7378 [19:38:39<5:38:24, 12.30s/it] + 78%|███████▊ | 5728/7378 [19:38:50<5:32:16, 12.08s/it] + +{'loss': 0.4391, 'learning_rate': 2.5108629964358577e-06, 'epoch': 0.78} + + 78%|███████▊ | 5728/7378 [19:38:50<5:32:16, 12.08s/it] + 78%|███████▊ | 5729/7378 [19:39:02<5:32:10, 12.09s/it] + +{'loss': 0.4157, 'learning_rate': 2.507954507116237e-06, 'epoch': 0.78} + + 78%|███████▊ | 5729/7378 [19:39:02<5:32:10, 12.09s/it] + 78%|███████▊ | 5730/7378 [19:39:15<5:32:28, 12.10s/it] + +{'loss': 0.4508, 'learning_rate': 2.5050474617709718e-06, 'epoch': 0.78} + + 78%|███████▊ | 5730/7378 [19:39:15<5:32:28, 12.10s/it] + 78%|███████▊ | 5731/7378 [19:39:27<5:33:26, 12.15s/it] + +{'loss': 0.4833, 'learning_rate': 2.5021418609603477e-06, 'epoch': 0.78} + + 78%|███████▊ | 5731/7378 [19:39:27<5:33:26, 12.15s/it] + 78%|███████▊ | 5732/7378 [19:39:39<5:33:27, 12.16s/it] + +{'loss': 0.355, 'learning_rate': 2.4992377052443783e-06, 'epoch': 0.78} + + 78%|███████▊ | 5732/7378 [19:39:39<5:33:27, 12.16s/it] + 78%|███████▊ | 5733/7378 [19:39:51<5:35:10, 12.23s/it] + +{'loss': 0.4632, 'learning_rate': 2.4963349951827907e-06, 'epoch': 0.78} + + 78%|███████▊ | 5733/7378 [19:39:51<5:35:10, 12.23s/it] + 78%|███████▊ | 5734/7378 [19:40:03<5:33:00, 12.15s/it] + +{'loss': 0.3859, 'learning_rate': 2.4934337313350386e-06, 'epoch': 0.78} + + 78%|███████▊ | 5734/7378 [19:40:03<5:33:00, 12.15s/it] + 78%|███████▊ | 5735/7378 [19:40:16<5:35:14, 12.24s/it] + +{'loss': 0.4307, 'learning_rate': 2.4905339142602938e-06, 'epoch': 0.78} + + 78%|███████▊ | 5735/7378 [19:40:16<5:35:14, 12.24s/it] + 78%|███████▊ | 5736/7378 [19:40:28<5:32:37, 12.15s/it] + +{'loss': 0.4209, 'learning_rate': 2.487635544517448e-06, 'epoch': 0.78} + + 78%|███████▊ | 5736/7378 [19:40:28<5:32:37, 12.15s/it] + 78%|███████▊ | 5737/7378 [19:40:40<5:32:19, 12.15s/it] + +{'loss': 0.3681, 'learning_rate': 2.4847386226651227e-06, 'epoch': 0.78} + + 78%|███████▊ | 5737/7378 [19:40:40<5:32:19, 12.15s/it] + 78%|███████▊ | 5738/7378 [19:40:52<5:34:45, 12.25s/it] + +{'loss': 0.4116, 'learning_rate': 2.48184314926165e-06, 'epoch': 0.78} + + 78%|███████▊ | 5738/7378 [19:40:52<5:34:45, 12.25s/it] + 78%|███████▊ | 5739/7378 [19:41:05<5:35:50, 12.29s/it] + +{'loss': 0.3845, 'learning_rate': 2.47894912486509e-06, 'epoch': 0.78} + + 78%|███████▊ | 5739/7378 [19:41:05<5:35:50, 12.29s/it] + 78%|███████▊ | 5740/7378 [19:41:17<5:35:08, 12.28s/it] + +{'loss': 0.4532, 'learning_rate': 2.4760565500332135e-06, 'epoch': 0.78} + + 78%|███████▊ | 5740/7378 [19:41:17<5:35:08, 12.28s/it] + 78%|███████▊ | 5741/7378 [19:41:29<5:36:00, 12.32s/it] + +{'loss': 0.3913, 'learning_rate': 2.473165425323528e-06, 'epoch': 0.78} + + 78%|███████▊ | 5741/7378 [19:41:29<5:36:00, 12.32s/it] + 78%|███████▊ | 5742/7378 [19:41:42<5:36:01, 12.32s/it] + +{'loss': 0.4048, 'learning_rate': 2.4702757512932463e-06, 'epoch': 0.78} + + 78%|███████▊ | 5742/7378 [19:41:42<5:36:01, 12.32s/it] + 78%|███████▊ | 5743/7378 [19:41:54<5:35:08, 12.30s/it] + +{'loss': 0.4461, 'learning_rate': 2.467387528499312e-06, 'epoch': 0.78} + + 78%|███████▊ | 5743/7378 [19:41:54<5:35:08, 12.30s/it] + 78%|███████▊ | 5744/7378 [19:42:07<5:37:50, 12.41s/it] + +{'loss': 0.469, 'learning_rate': 2.4645007574983827e-06, 'epoch': 0.78} + + 78%|███████▊ | 5744/7378 [19:42:07<5:37:50, 12.41s/it] + 78%|███████▊ | 5745/7378 [19:42:19<5:37:30, 12.40s/it] + +{'loss': 0.4721, 'learning_rate': 2.4616154388468383e-06, 'epoch': 0.78} + + 78%|███████▊ | 5745/7378 [19:42:19<5:37:30, 12.40s/it] + 78%|███████▊ | 5746/7378 [19:42:31<5:36:41, 12.38s/it] + +{'loss': 0.506, 'learning_rate': 2.4587315731007765e-06, 'epoch': 0.78} + + 78%|███████▊ | 5746/7378 [19:42:31<5:36:41, 12.38s/it] + 78%|███████▊ | 5747/7378 [19:42:44<5:37:35, 12.42s/it] + +{'loss': 0.4667, 'learning_rate': 2.4558491608160217e-06, 'epoch': 0.78} + + 78%|███████▊ | 5747/7378 [19:42:44<5:37:35, 12.42s/it] + 78%|███████▊ | 5748/7378 [19:42:57<5:46:56, 12.77s/it] + +{'loss': 0.4087, 'learning_rate': 2.4529682025481118e-06, 'epoch': 0.78} + + 78%|███████▊ | 5748/7378 [19:42:57<5:46:56, 12.77s/it] + 78%|███████▊ | 5749/7378 [19:43:10<5:45:53, 12.74s/it] + +{'loss': 0.4205, 'learning_rate': 2.4500886988523065e-06, 'epoch': 0.78} + + 78%|███████▊ | 5749/7378 [19:43:10<5:45:53, 12.74s/it] + 78%|███████▊ | 5750/7378 [19:43:22<5:41:22, 12.58s/it] + +{'loss': 0.4148, 'learning_rate': 2.4472106502835815e-06, 'epoch': 0.78} + + 78%|███████▊ | 5750/7378 [19:43:22<5:41:22, 12.58s/it] + 78%|███████▊ | 5751/7378 [19:43:35<5:43:08, 12.65s/it] + +{'loss': 0.4453, 'learning_rate': 2.444334057396641e-06, 'epoch': 0.78} + + 78%|███████▊ | 5751/7378 [19:43:35<5:43:08, 12.65s/it] + 78%|███████▊ | 5752/7378 [19:43:47<5:39:20, 12.52s/it] + +{'loss': 0.3967, 'learning_rate': 2.4414589207459018e-06, 'epoch': 0.78} + + 78%|███████▊ | 5752/7378 [19:43:47<5:39:20, 12.52s/it] + 78%|███████▊ | 5753/7378 [19:43:59<5:35:44, 12.40s/it] + +{'loss': 0.3869, 'learning_rate': 2.4385852408854993e-06, 'epoch': 0.78} + + 78%|███████▊ | 5753/7378 [19:43:59<5:35:44, 12.40s/it] + 78%|███████▊ | 5754/7378 [19:44:11<5:31:34, 12.25s/it] + +{'loss': 0.4916, 'learning_rate': 2.435713018369292e-06, 'epoch': 0.78} + + 78%|███████▊ | 5754/7378 [19:44:11<5:31:34, 12.25s/it] + 78%|███████▊ | 5755/7378 [19:44:24<5:31:40, 12.26s/it] + +{'loss': 0.5229, 'learning_rate': 2.4328422537508524e-06, 'epoch': 0.78} + + 78%|███████▊ | 5755/7378 [19:44:24<5:31:40, 12.26s/it] + 78%|███████▊ | 5756/7378 [19:44:36<5:30:33, 12.23s/it] + +{'loss': 0.4369, 'learning_rate': 2.429972947583481e-06, 'epoch': 0.78} + + 78%|███████▊ | 5756/7378 [19:44:36<5:30:33, 12.23s/it] + 78%|███████▊ | 5757/7378 [19:44:48<5:30:50, 12.25s/it] + +{'loss': 0.4087, 'learning_rate': 2.4271051004201896e-06, 'epoch': 0.78} + + 78%|███████▊ | 5757/7378 [19:44:48<5:30:50, 12.25s/it] + 78%|███████▊ | 5758/7378 [19:45:00<5:29:25, 12.20s/it] + +{'loss': 0.4645, 'learning_rate': 2.4242387128137092e-06, 'epoch': 0.78} + + 78%|███████▊ | 5758/7378 [19:45:00<5:29:25, 12.20s/it] + 78%|███████▊ | 5759/7378 [19:45:13<5:32:08, 12.31s/it] + +{'loss': 0.4386, 'learning_rate': 2.4213737853164887e-06, 'epoch': 0.78} + + 78%|███████▊ | 5759/7378 [19:45:13<5:32:08, 12.31s/it] + 78%|███████▊ | 5760/7378 [19:45:25<5:31:34, 12.30s/it] + +{'loss': 0.3851, 'learning_rate': 2.4185103184807045e-06, 'epoch': 0.78} + + 78%|███████▊ | 5760/7378 [19:45:25<5:31:34, 12.30s/it] + 78%|███████▊ | 5761/7378 [19:45:37<5:31:02, 12.28s/it] + +{'loss': 0.4497, 'learning_rate': 2.415648312858241e-06, 'epoch': 0.78} + + 78%|███████▊ | 5761/7378 [19:45:37<5:31:02, 12.28s/it] + 78%|███████▊ | 5762/7378 [19:45:50<5:38:14, 12.56s/it] + +{'loss': 0.415, 'learning_rate': 2.412787769000706e-06, 'epoch': 0.78} + + 78%|███████▊ | 5762/7378 [19:45:50<5:38:14, 12.56s/it] + 78%|███████▊ | 5763/7378 [19:46:03<5:34:54, 12.44s/it] + +{'loss': 0.4463, 'learning_rate': 2.4099286874594243e-06, 'epoch': 0.78} + + 78%|███████▊ | 5763/7378 [19:46:03<5:34:54, 12.44s/it] + 78%|███████▊ | 5764/7378 [19:46:15<5:30:48, 12.30s/it] + +{'loss': 0.4059, 'learning_rate': 2.407071068785436e-06, 'epoch': 0.78} + + 78%|███████▊ | 5764/7378 [19:46:15<5:30:48, 12.30s/it] + 78%|███████▊ | 5765/7378 [19:46:27<5:29:02, 12.24s/it] + +{'loss': 0.4314, 'learning_rate': 2.404214913529508e-06, 'epoch': 0.78} + + 78%|███████▊ | 5765/7378 [19:46:27<5:29:02, 12.24s/it] + 78%|███████▊ | 5766/7378 [19:46:40<5:34:12, 12.44s/it] + +{'loss': 0.4443, 'learning_rate': 2.4013602222421162e-06, 'epoch': 0.78} + + 78%|███████▊ | 5766/7378 [19:46:40<5:34:12, 12.44s/it] + 78%|███████▊ | 5767/7378 [19:46:52<5:32:44, 12.39s/it] + +{'loss': 0.5095, 'learning_rate': 2.3985069954734576e-06, 'epoch': 0.78} + + 78%|███████▊ | 5767/7378 [19:46:52<5:32:44, 12.39s/it] + 78%|███████▊ | 5768/7378 [19:47:04<5:32:57, 12.41s/it] + +{'loss': 0.4868, 'learning_rate': 2.395655233773445e-06, 'epoch': 0.78} + + 78%|███████▊ | 5768/7378 [19:47:04<5:32:57, 12.41s/it] + 78%|███████▊ | 5769/7378 [19:47:16<5:29:19, 12.28s/it] + +{'loss': 0.4551, 'learning_rate': 2.392804937691716e-06, 'epoch': 0.78} + + 78%|███████▊ | 5769/7378 [19:47:16<5:29:19, 12.28s/it] + 78%|███████▊ | 5770/7378 [19:47:28<5:27:36, 12.22s/it] + +{'loss': 0.4535, 'learning_rate': 2.389956107777618e-06, 'epoch': 0.78} + + 78%|███████▊ | 5770/7378 [19:47:28<5:27:36, 12.22s/it] + 78%|███████▊ | 5771/7378 [19:47:41<5:27:00, 12.21s/it] + +{'loss': 0.4505, 'learning_rate': 2.3871087445802175e-06, 'epoch': 0.78} + + 78%|███████▊ | 5771/7378 [19:47:41<5:27:00, 12.21s/it] + 78%|███████▊ | 5772/7378 [19:47:53<5:27:37, 12.24s/it] + +{'loss': 0.4308, 'learning_rate': 2.3842628486483e-06, 'epoch': 0.78} + + 78%|█████���█▊ | 5772/7378 [19:47:53<5:27:37, 12.24s/it] + 78%|███████▊ | 5773/7378 [19:48:05<5:28:23, 12.28s/it] + +{'loss': 0.4564, 'learning_rate': 2.381418420530364e-06, 'epoch': 0.78} + + 78%|███████▊ | 5773/7378 [19:48:05<5:28:23, 12.28s/it] + 78%|███████▊ | 5774/7378 [19:48:17<5:24:35, 12.14s/it] + +{'loss': 0.4002, 'learning_rate': 2.3785754607746327e-06, 'epoch': 0.78} + + 78%|███████▊ | 5774/7378 [19:48:17<5:24:35, 12.14s/it] + 78%|███████▊ | 5775/7378 [19:48:30<5:27:26, 12.26s/it] + +{'loss': 0.4292, 'learning_rate': 2.3757339699290417e-06, 'epoch': 0.78} + + 78%|███████▊ | 5775/7378 [19:48:30<5:27:26, 12.26s/it] + 78%|███████▊ | 5776/7378 [19:48:42<5:24:24, 12.15s/it] + +{'loss': 0.4407, 'learning_rate': 2.3728939485412437e-06, 'epoch': 0.78} + + 78%|███████▊ | 5776/7378 [19:48:42<5:24:24, 12.15s/it] + 78%|███████▊ | 5777/7378 [19:48:54<5:25:40, 12.21s/it] + +{'loss': 0.4675, 'learning_rate': 2.370055397158604e-06, 'epoch': 0.78} + + 78%|███████▊ | 5777/7378 [19:48:54<5:25:40, 12.21s/it] + 78%|███████▊ | 5778/7378 [19:49:07<5:29:36, 12.36s/it] + +{'loss': 0.3647, 'learning_rate': 2.3672183163282146e-06, 'epoch': 0.78} + + 78%|███████▊ | 5778/7378 [19:49:07<5:29:36, 12.36s/it] + 78%|███████▊ | 5779/7378 [19:49:19<5:28:50, 12.34s/it] + +{'loss': 0.4428, 'learning_rate': 2.3643827065968774e-06, 'epoch': 0.78} + + 78%|███████▊ | 5779/7378 [19:49:19<5:28:50, 12.34s/it] + 78%|███████▊ | 5780/7378 [19:49:31<5:27:32, 12.30s/it] + +{'loss': 0.4137, 'learning_rate': 2.3615485685111083e-06, 'epoch': 0.78} + + 78%|███████▊ | 5780/7378 [19:49:31<5:27:32, 12.30s/it] + 78%|███████▊ | 5781/7378 [19:49:43<5:25:47, 12.24s/it] + +{'loss': 0.42, 'learning_rate': 2.3587159026171468e-06, 'epoch': 0.78} + + 78%|███████▊ | 5781/7378 [19:49:43<5:25:47, 12.24s/it] + 78%|███████▊ | 5782/7378 [19:49:55<5:25:27, 12.24s/it] + +{'loss': 0.445, 'learning_rate': 2.3558847094609406e-06, 'epoch': 0.78} + + 78%|███████▊ | 5782/7378 [19:49:55<5:25:27, 12.24s/it] + 78%|███████▊ | 5783/7378 [19:50:08<5:24:49, 12.22s/it] + +{'loss': 0.4497, 'learning_rate': 2.353054989588163e-06, 'epoch': 0.78} + + 78%|███████▊ | 5783/7378 [19:50:08<5:24:49, 12.22s/it] + 78%|███████▊ | 5784/7378 [19:50:19<5:21:06, 12.09s/it] + +{'loss': 0.423, 'learning_rate': 2.3502267435441938e-06, 'epoch': 0.78} + + 78%|███████▊ | 5784/7378 [19:50:19<5:21:06, 12.09s/it] + 78%|███████▊ | 5785/7378 [19:50:32<5:23:14, 12.17s/it] + +{'loss': 0.4514, 'learning_rate': 2.347399971874137e-06, 'epoch': 0.78} + + 78%|███████▊ | 5785/7378 [19:50:32<5:23:14, 12.17s/it] + 78%|███████▊ | 5786/7378 [19:50:44<5:21:21, 12.11s/it] + +{'loss': 0.4417, 'learning_rate': 2.3445746751228025e-06, 'epoch': 0.78} + + 78%|███████▊ | 5786/7378 [19:50:44<5:21:21, 12.11s/it] + 78%|███████▊ | 5787/7378 [19:50:56<5:24:06, 12.22s/it] + +{'loss': 0.4902, 'learning_rate': 2.3417508538347265e-06, 'epoch': 0.78} + + 78%|███████▊ | 5787/7378 [19:50:56<5:24:06, 12.22s/it] + 78%|███████▊ | 5788/7378 [19:51:09<5:26:41, 12.33s/it] + +{'loss': 0.4716, 'learning_rate': 2.338928508554158e-06, 'epoch': 0.78} + + 78%|███████▊ | 5788/7378 [19:51:09<5:26:41, 12.33s/it] + 78%|███████▊ | 5789/7378 [19:51:21<5:23:33, 12.22s/it] + +{'loss': 0.3972, 'learning_rate': 2.336107639825058e-06, 'epoch': 0.78} + + 78%|███████▊ | 5789/7378 [19:51:21<5:23:33, 12.22s/it] + 78%|███████▊ | 5790/7378 [19:51:33<5:20:53, 12.12s/it] + +{'loss': 0.4151, 'learning_rate': 2.3332882481911032e-06, 'epoch': 0.78} + + 78%|███████▊ | 5790/7378 [19:51:33<5:20:53, 12.12s/it] + 78%|███████▊ | 5791/7378 [19:51:45<5:21:27, 12.15s/it] + +{'loss': 0.3706, 'learning_rate': 2.3304703341956893e-06, 'epoch': 0.78} + + 78%|███████▊ | 5791/7378 [19:51:45<5:21:27, 12.15s/it] + 79%|███████▊ | 5792/7378 [19:51:57<5:20:18, 12.12s/it] + +{'loss': 0.4977, 'learning_rate': 2.327653898381921e-06, 'epoch': 0.79} + + 79%|███████▊ | 5792/7378 [19:51:57<5:20:18, 12.12s/it] + 79%|███████▊ | 5793/7378 [19:52:09<5:20:02, 12.12s/it] + +{'loss': 0.4056, 'learning_rate': 2.3248389412926277e-06, 'epoch': 0.79} + + 79%|███████▊ | 5793/7378 [19:52:09<5:20:02, 12.12s/it] + 79%|███████▊ | 5794/7378 [19:52:21<5:18:26, 12.06s/it] + +{'loss': 0.4379, 'learning_rate': 2.3220254634703452e-06, 'epoch': 0.79} + + 79%|███████▊ | 5794/7378 [19:52:21<5:18:26, 12.06s/it] + 79%|███████▊ | 5795/7378 [19:52:33<5:21:20, 12.18s/it] + +{'loss': 0.4332, 'learning_rate': 2.3192134654573285e-06, 'epoch': 0.79} + + 79%|███████▊ | 5795/7378 [19:52:33<5:21:20, 12.18s/it] + 79%|███████▊ | 5796/7378 [19:52:46<5:22:59, 12.25s/it] + +{'loss': 0.3964, 'learning_rate': 2.3164029477955454e-06, 'epoch': 0.79} + + 79%|███████▊ | 5796/7378 [19:52:46<5:22:59, 12.25s/it] + 79%|███████▊ | 5797/7378 [19:52:58<5:21:31, 12.20s/it] + +{'loss': 0.4514, 'learning_rate': 2.313593911026676e-06, 'epoch': 0.79} + + 79%|███████▊ | 5797/7378 [19:52:58<5:21:31, 12.20s/it] + 79%|███████▊ | 5798/7378 [19:53:10<5:22:05, 12.23s/it] + +{'loss': 0.4826, 'learning_rate': 2.310786355692124e-06, 'epoch': 0.79} + + 79%|███████▊ | 5798/7378 [19:53:10<5:22:05, 12.23s/it] + 79%|███████▊ | 5799/7378 [19:53:22<5:22:08, 12.24s/it] + +{'loss': 0.437, 'learning_rate': 2.3079802823329987e-06, 'epoch': 0.79} + + 79%|███████▊ | 5799/7378 [19:53:22<5:22:08, 12.24s/it] + 79%|███████▊ | 5800/7378 [19:53:35<5:20:58, 12.20s/it] + +{'loss': 0.4648, 'learning_rate': 2.305175691490128e-06, 'epoch': 0.79} + + 79%|███████▊ | 5800/7378 [19:53:35<5:20:58, 12.20s/it] + 79%|███████▊ | 5801/7378 [19:53:47<5:22:34, 12.27s/it] + +{'loss': 0.4502, 'learning_rate': 2.302372583704048e-06, 'epoch': 0.79} + + 79%|███████▊ | 5801/7378 [19:53:47<5:22:34, 12.27s/it] + 79%|███████▊ | 5802/7378 [19:53:59<5:21:20, 12.23s/it] + +{'loss': 0.4651, 'learning_rate': 2.2995709595150208e-06, 'epoch': 0.79} + + 79%|███████▊ | 5802/7378 [19:53:59<5:21:20, 12.23s/it] + 79%|███████▊ | 5803/7378 [19:54:12<5:22:58, 12.30s/it] + +{'loss': 0.4375, 'learning_rate': 2.2967708194630122e-06, 'epoch': 0.79} + + 79%|███████▊ | 5803/7378 [19:54:12<5:22:58, 12.30s/it] + 79%|███████▊ | 5804/7378 [19:54:24<5:21:35, 12.26s/it] + +{'loss': 0.3914, 'learning_rate': 2.2939721640877054e-06, 'epoch': 0.79} + + 79%|███████▊ | 5804/7378 [19:54:24<5:21:35, 12.26s/it] + 79%|███████▊ | 5805/7378 [19:54:36<5:21:08, 12.25s/it] + +{'loss': 0.4616, 'learning_rate': 2.291174993928499e-06, 'epoch': 0.79} + + 79%|███████▊ | 5805/7378 [19:54:36<5:21:08, 12.25s/it] + 79%|███████▊ | 5806/7378 [19:54:48<5:17:22, 12.11s/it] + +{'loss': 0.3852, 'learning_rate': 2.288379309524498e-06, 'epoch': 0.79} + + 79%|███████▊ | 5806/7378 [19:54:48<5:17:22, 12.11s/it] + 79%|███████▊ | 5807/7378 [19:55:00<5:19:08, 12.19s/it] + +{'loss': 0.4288, 'learning_rate': 2.2855851114145333e-06, 'epoch': 0.79} + + 79%|███████▊ | 5807/7378 [19:55:00<5:19:08, 12.19s/it] + 79%|███████▊ | 5808/7378 [19:55:13<5:23:01, 12.34s/it] + +{'loss': 0.4804, 'learning_rate': 2.2827924001371405e-06, 'epoch': 0.79} + + 79%|███████▊ | 5808/7378 [19:55:13<5:23:01, 12.34s/it] + 79%|███████▊ | 5809/7378 [19:55:25<5:20:41, 12.26s/it] + +{'loss': 0.4078, 'learning_rate': 2.280001176230572e-06, 'epoch': 0.79} + + 79%|███████▊ | 5809/7378 [19:55:25<5:20:41, 12.26s/it] + 79%|███████▊ | 5810/7378 [19:55:37<5:20:26, 12.26s/it] + +{'loss': 0.457, 'learning_rate': 2.277211440232787e-06, 'epoch': 0.79} + + 79%|███████▊ | 5810/7378 [19:55:37<5:20:26, 12.26s/it] + 79%|███████▉ | 5811/7378 [19:55:50<5:20:59, 12.29s/it] + +{'loss': 0.4209, 'learning_rate': 2.274423192681472e-06, 'epoch': 0.79} + + 79%|███████▉ | 5811/7378 [19:55:50<5:20:59, 12.29s/it] + 79%|███████▉ | 5812/7378 [19:56:03<5:28:40, 12.59s/it] + +{'loss': 0.4689, 'learning_rate': 2.271636434114013e-06, 'epoch': 0.79} + + 79%|███████▉ | 5812/7378 [19:56:03<5:28:40, 12.59s/it] + 79%|███████▉ | 5813/7378 [19:56:15<5:28:48, 12.61s/it] + +{'loss': 0.4162, 'learning_rate': 2.268851165067514e-06, 'epoch': 0.79} + + 79%|███████▉ | 5813/7378 [19:56:15<5:28:48, 12.61s/it] + 79%|███████▉ | 5814/7378 [19:56:27<5:23:50, 12.42s/it] + +{'loss': 0.3758, 'learning_rate': 2.2660673860787942e-06, 'epoch': 0.79} + + 79%|███████▉ | 5814/7378 [19:56:28<5:23:50, 12.42s/it] + 79%|███████▉ | 5815/7378 [19:56:40<5:24:40, 12.46s/it] + +{'loss': 0.4398, 'learning_rate': 2.2632850976843777e-06, 'epoch': 0.79} + + 79%|███████▉ | 5815/7378 [19:56:40<5:24:40, 12.46s/it] + 79%|███████▉ | 5816/7378 [19:56:52<5:22:18, 12.38s/it] + +{'loss': 0.4093, 'learning_rate': 2.260504300420515e-06, 'epoch': 0.79} + + 79%|███████▉ | 5816/7378 [19:56:52<5:22:18, 12.38s/it] + 79%|███████▉ | 5817/7378 [19:57:04<5:18:22, 12.24s/it] + +{'loss': 0.3671, 'learning_rate': 2.257724994823157e-06, 'epoch': 0.79} + + 79%|███████▉ | 5817/7378 [19:57:04<5:18:22, 12.24s/it] + 79%|███████▉ | 5818/7378 [19:57:16<5:17:47, 12.22s/it] + +{'loss': 0.4632, 'learning_rate': 2.254947181427971e-06, 'epoch': 0.79} + + 79%|███████▉ | 5818/7378 [19:57:16<5:17:47, 12.22s/it] + 79%|███████▉ | 5819/7378 [19:57:29<5:21:19, 12.37s/it] + +{'loss': 0.3787, 'learning_rate': 2.252170860770336e-06, 'epoch': 0.79} + + 79%|███████▉ | 5819/7378 [19:57:29<5:21:19, 12.37s/it] + 79%|███████▉ | 5820/7378 [19:57:42<5:22:36, 12.42s/it] + +{'loss': 0.4511, 'learning_rate': 2.2493960333853482e-06, 'epoch': 0.79} + + 79%|███████▉ | 5820/7378 [19:57:42<5:22:36, 12.42s/it] + 79%|███████▉ | 5821/7378 [19:57:54<5:21:35, 12.39s/it] + +{'loss': 0.3694, 'learning_rate': 2.24662269980781e-06, 'epoch': 0.79} + + 79%|███████▉ | 5821/7378 [19:57:54<5:21:35, 12.39s/it] + 79%|███████▉ | 5822/7378 [19:58:07<5:23:21, 12.47s/it] + +{'loss': 0.3735, 'learning_rate': 2.243850860572239e-06, 'epoch': 0.79} + + 79%|███████▉ | 5822/7378 [19:58:07<5:23:21, 12.47s/it] + 79%|███████▉ | 5823/7378 [19:58:19<5:20:12, 12.36s/it] + +{'loss': 0.4831, 'learning_rate': 2.2410805162128603e-06, 'epoch': 0.79} + + 79%|███████▉ | 5823/7378 [19:58:19<5:20:12, 12.36s/it] + 79%|███████▉ | 5824/7378 [19:58:31<5:16:44, 12.23s/it] + +{'loss': 0.3923, 'learning_rate': 2.238311667263615e-06, 'epoch': 0.79} + + 79%|███████▉ | 5824/7378 [19:58:31<5:16:44, 12.23s/it] + 79%|███████▉ | 5825/7378 [19:58:43<5:14:58, 12.17s/it] + +{'loss': 0.4323, 'learning_rate': 2.23554431425816e-06, 'epoch': 0.79} + + 79%|███████▉ | 5825/7378 [19:58:43<5:14:58, 12.17s/it] + 79%|███████▉ | 5826/7378 [19:58:55<5:15:13, 12.19s/it] + +{'loss': 0.4063, 'learning_rate': 2.2327784577298562e-06, 'epoch': 0.79} + + 79%|███████▉ | 5826/7378 [19:58:55<5:15:13, 12.19s/it] + 79%|███████▉ | 5827/7378 [19:59:07<5:14:07, 12.15s/it] + +{'loss': 0.4507, 'learning_rate': 2.230014098211779e-06, 'epoch': 0.79} + + 79%|███████▉ | 5827/7378 [19:59:07<5:14:07, 12.15s/it] + 79%|███████▉ | 5828/7378 [19:59:19<5:12:47, 12.11s/it] + +{'loss': 0.4421, 'learning_rate': 2.2272512362367126e-06, 'epoch': 0.79} + + 79%|███████▉ | 5828/7378 [19:59:19<5:12:47, 12.11s/it] + 79%|███████▉ | 5829/7378 [19:59:31<5:11:04, 12.05s/it] + +{'loss': 0.4828, 'learning_rate': 2.2244898723371587e-06, 'epoch': 0.79} + + 79%|███████▉ | 5829/7378 [19:59:31<5:11:04, 12.05s/it] + 79%|███████▉ | 5830/7378 [19:59:43<5:09:00, 11.98s/it] + +{'loss': 0.4319, 'learning_rate': 2.2217300070453298e-06, 'epoch': 0.79} + + 79%|███████▉ | 5830/7378 [19:59:43<5:09:00, 11.98s/it] + 79%|███████▉ | 5831/7378 [19:59:55<5:12:01, 12.10s/it] + +{'loss': 0.3871, 'learning_rate': 2.2189716408931415e-06, 'epoch': 0.79} + + 79%|███████▉ | 5831/7378 [19:59:55<5:12:01, 12.10s/it] + 79%|███████▉ | 5832/7378 [20:00:07<5:12:23, 12.12s/it] + +{'loss': 0.3906, 'learning_rate': 2.21621477441223e-06, 'epoch': 0.79} + + 79%|███████▉ | 5832/7378 [20:00:07<5:12:23, 12.12s/it] + 79%|███████▉ | 5833/7378 [20:00:19<5:13:01, 12.16s/it] + +{'loss': 0.3721, 'learning_rate': 2.2134594081339335e-06, 'epoch': 0.79} + + 79%|███████▉ | 5833/7378 [20:00:19<5:13:01, 12.16s/it] + 79%|███████▉ | 5834/7378 [20:00:31<5:11:04, 12.09s/it] + +{'loss': 0.4391, 'learning_rate': 2.2107055425893052e-06, 'epoch': 0.79} + + 79%|███████▉ | 5834/7378 [20:00:31<5:11:04, 12.09s/it] + 79%|███████▉ | 5835/7378 [20:00:43<5:10:33, 12.08s/it] + +{'loss': 0.4399, 'learning_rate': 2.207953178309116e-06, 'epoch': 0.79} + + 79%|███████▉ | 5835/7378 [20:00:43<5:10:33, 12.08s/it] + 79%|███████▉ | 5836/7378 [20:00:56<5:11:38, 12.13s/it] + +{'loss': 0.4888, 'learning_rate': 2.2052023158238366e-06, 'epoch': 0.79} + + 79%|███████▉ | 5836/7378 [20:00:56<5:11:38, 12.13s/it] + 79%|███████▉ | 5837/7378 [20:01:07<5:08:39, 12.02s/it] + +{'loss': 0.4047, 'learning_rate': 2.202452955663653e-06, 'epoch': 0.79} + + 79%|███████▉ | 5837/7378 [20:01:07<5:08:39, 12.02s/it] + 79%|███████▉ | 5838/7378 [20:01:20<5:12:44, 12.18s/it] + +{'loss': 0.4916, 'learning_rate': 2.1997050983584588e-06, 'epoch': 0.79} + + 79%|███████▉ | 5838/7378 [20:01:20<5:12:44, 12.18s/it] + 79%|███████▉ | 5839/7378 [20:01:32<5:12:35, 12.19s/it] + +{'loss': 0.4269, 'learning_rate': 2.196958744437866e-06, 'epoch': 0.79} + + 79%|███████▉ | 5839/7378 [20:01:32<5:12:35, 12.19s/it] + 79%|███████▉ | 5840/7378 [20:01:45<5:13:36, 12.23s/it] + +{'loss': 0.3915, 'learning_rate': 2.1942138944311875e-06, 'epoch': 0.79} + + 79%|███████▉ | 5840/7378 [20:01:45<5:13:36, 12.23s/it] + 79%|███████▉ | 5841/7378 [20:01:57<5:14:15, 12.27s/it] + +{'loss': 0.3542, 'learning_rate': 2.1914705488674515e-06, 'epoch': 0.79} + + 79%|███████▉ | 5841/7378 [20:01:57<5:14:15, 12.27s/it] + 79%|███████▉ | 5842/7378 [20:02:09<5:15:18, 12.32s/it] + +{'loss': 0.4467, 'learning_rate': 2.188728708275395e-06, 'epoch': 0.79} + + 79%|███████▉ | 5842/7378 [20:02:09<5:15:18, 12.32s/it] + 79%|███████▉ | 5843/7378 [20:02:22<5:19:13, 12.48s/it] + +{'loss': 0.4188, 'learning_rate': 2.185988373183461e-06, 'epoch': 0.79} + + 79%|███████▉ | 5843/7378 [20:02:22<5:19:13, 12.48s/it] + 79%|███████▉ | 5844/7378 [20:02:35<5:18:48, 12.47s/it] + +{'loss': 0.4758, 'learning_rate': 2.183249544119811e-06, 'epoch': 0.79} + + 79%|███████▉ | 5844/7378 [20:02:35<5:18:48, 12.47s/it] + 79%|███████▉ | 5845/7378 [20:02:47<5:15:53, 12.36s/it] + +{'loss': 0.4281, 'learning_rate': 2.180512221612311e-06, 'epoch': 0.79} + + 79%|███████▉ | 5845/7378 [20:02:47<5:15:53, 12.36s/it] + 79%|███████▉ | 5846/7378 [20:03:00<5:19:01, 12.49s/it] + +{'loss': 0.4676, 'learning_rate': 2.177776406188534e-06, 'epoch': 0.79} + + 79%|███████▉ | 5846/7378 [20:03:00<5:19:01, 12.49s/it] + 79%|███████▉ | 5847/7378 [20:03:12<5:17:56, 12.46s/it] + +{'loss': 0.4543, 'learning_rate': 2.175042098375766e-06, 'epoch': 0.79} + + 79%|███████▉ | 5847/7378 [20:03:12<5:17:56, 12.46s/it] + 79%|███████▉ | 5848/7378 [20:03:24<5:13:56, 12.31s/it] + +{'loss': 0.4335, 'learning_rate': 2.1723092987010053e-06, 'epoch': 0.79} + + 79%|███████▉ | 5848/7378 [20:03:24<5:13:56, 12.31s/it] + 79%|███████▉ | 5849/7378 [20:03:36<5:14:22, 12.34s/it] + +{'loss': 0.3956, 'learning_rate': 2.1695780076909543e-06, 'epoch': 0.79} + + 79%|███████▉ | 5849/7378 [20:03:36<5:14:22, 12.34s/it] + 79%|███████▉ | 5850/7378 [20:03:49<5:15:59, 12.41s/it] + +{'loss': 0.4551, 'learning_rate': 2.166848225872026e-06, 'epoch': 0.79} + + 79%|███████▉ | 5850/7378 [20:03:49<5:15:59, 12.41s/it] + 79%|███████▉ | 5851/7378 [20:04:01<5:13:08, 12.30s/it] + +{'loss': 0.4213, 'learning_rate': 2.164119953770344e-06, 'epoch': 0.79} + + 79%|███████▉ | 5851/7378 [20:04:01<5:13:08, 12.30s/it] + 79%|███████▉ | 5852/7378 [20:04:13<5:11:05, 12.23s/it] + +{'loss': 0.4163, 'learning_rate': 2.161393191911736e-06, 'epoch': 0.79} + + 79%|███████▉ | 5852/7378 [20:04:13<5:11:05, 12.23s/it] + 79%|███████▉ | 5853/7378 [20:04:25<5:10:50, 12.23s/it] + +{'loss': 0.434, 'learning_rate': 2.1586679408217494e-06, 'epoch': 0.79} + + 79%|███████▉ | 5853/7378 [20:04:25<5:10:50, 12.23s/it] + 79%|███████▉ | 5854/7378 [20:04:37<5:10:27, 12.22s/it] + +{'loss': 0.4742, 'learning_rate': 2.1559442010256292e-06, 'epoch': 0.79} + + 79%|███████▉ | 5854/7378 [20:04:37<5:10:27, 12.22s/it] + 79%|███████▉ | 5855/7378 [20:04:50<5:16:33, 12.47s/it] + +{'loss': 0.4157, 'learning_rate': 2.153221973048335e-06, 'epoch': 0.79} + + 79%|███████▉ | 5855/7378 [20:04:50<5:16:33, 12.47s/it] + 79%|███████▉ | 5856/7378 [20:05:03<5:16:41, 12.48s/it] + +{'loss': 0.483, 'learning_rate': 2.1505012574145335e-06, 'epoch': 0.79} + + 79%|███████▉ | 5856/7378 [20:05:03<5:16:41, 12.48s/it] + 79%|███████▉ | 5857/7378 [20:05:15<5:15:49, 12.46s/it] + +{'loss': 0.3789, 'learning_rate': 2.147782054648597e-06, 'epoch': 0.79} + + 79%|███████▉ | 5857/7378 [20:05:15<5:15:49, 12.46s/it] + 79%|███████▉ | 5858/7378 [20:05:27<5:12:04, 12.32s/it] + +{'loss': 0.451, 'learning_rate': 2.145064365274615e-06, 'epoch': 0.79} + + 79%|███████▉ | 5858/7378 [20:05:27<5:12:04, 12.32s/it] + 79%|███████▉ | 5859/7378 [20:05:40<5:15:38, 12.47s/it] + +{'loss': 0.441, 'learning_rate': 2.1423481898163754e-06, 'epoch': 0.79} + + 79%|███████▉ | 5859/7378 [20:05:40<5:15:38, 12.47s/it] + 79%|███████▉ | 5860/7378 [20:05:53<5:14:33, 12.43s/it] + +{'loss': 0.4038, 'learning_rate': 2.13963352879738e-06, 'epoch': 0.79} + + 79%|███████▉ | 5860/7378 [20:05:53<5:14:33, 12.43s/it] + 79%|███████▉ | 5861/7378 [20:06:05<5:12:54, 12.38s/it] + +{'loss': 0.4871, 'learning_rate': 2.1369203827408348e-06, 'epoch': 0.79} + + 79%|███████▉ | 5861/7378 [20:06:05<5:12:54, 12.38s/it] + 79%|███████▉ | 5862/7378 [20:06:17<5:11:47, 12.34s/it] + +{'loss': 0.483, 'learning_rate': 2.1342087521696597e-06, 'epoch': 0.79} + + 79%|███████▉ | 5862/7378 [20:06:17<5:11:47, 12.34s/it] + 79%|███████▉ | 5863/7378 [20:06:30<5:12:58, 12.39s/it] + +{'loss': 0.3988, 'learning_rate': 2.131498637606477e-06, 'epoch': 0.79} + + 79%|███████▉ | 5863/7378 [20:06:30<5:12:58, 12.39s/it] + 79%|███████▉ | 5864/7378 [20:06:41<5:08:25, 12.22s/it] + +{'loss': 0.4066, 'learning_rate': 2.1287900395736207e-06, 'epoch': 0.79} + + 79%|███████▉ | 5864/7378 [20:06:41<5:08:25, 12.22s/it] + 79%|███████▉ | 5865/7378 [20:06:54<5:12:17, 12.38s/it] + +{'loss': 0.4676, 'learning_rate': 2.1260829585931277e-06, 'epoch': 0.79} + + 79%|███████▉ | 5865/7378 [20:06:54<5:12:17, 12.38s/it] + 80%|███████▉ | 5866/7378 [20:07:06<5:09:38, 12.29s/it] + +{'loss': 0.4259, 'learning_rate': 2.1233773951867442e-06, 'epoch': 0.8} + + 80%|███████▉ | 5866/7378 [20:07:06<5:09:38, 12.29s/it] + 80%|███████▉ | 5867/7378 [20:07:19<5:10:28, 12.33s/it] + +{'loss': 0.4774, 'learning_rate': 2.1206733498759312e-06, 'epoch': 0.8} + + 80%|███████▉ | 5867/7378 [20:07:19<5:10:28, 12.33s/it] + 80%|███████▉ | 5868/7378 [20:07:32<5:15:35, 12.54s/it] + +{'loss': 0.4327, 'learning_rate': 2.117970823181846e-06, 'epoch': 0.8} + + 80%|███████▉ | 5868/7378 [20:07:32<5:15:35, 12.54s/it] + 80%|███████▉ | 5869/7378 [20:07:44<5:11:50, 12.40s/it] + +{'loss': 0.4442, 'learning_rate': 2.1152698156253617e-06, 'epoch': 0.8} + + 80%|███████▉ | 5869/7378 [20:07:44<5:11:50, 12.40s/it] + 80%|███████▉ | 5870/7378 [20:07:56<5:10:27, 12.35s/it] + +{'loss': 0.4163, 'learning_rate': 2.1125703277270502e-06, 'epoch': 0.8} + + 80%|███████▉ | 5870/7378 [20:07:56<5:10:27, 12.35s/it] + 80%|███████▉ | 5871/7378 [20:08:08<5:11:16, 12.39s/it] + +{'loss': 0.4239, 'learning_rate': 2.1098723600072015e-06, 'epoch': 0.8} + + 80%|███████▉ | 5871/7378 [20:08:08<5:11:16, 12.39s/it] + 80%|███████▉ | 5872/7378 [20:08:21<5:10:28, 12.37s/it] + +{'loss': 0.4768, 'learning_rate': 2.1071759129858026e-06, 'epoch': 0.8} + + 80%|███████▉ | 5872/7378 [20:08:21<5:10:28, 12.37s/it] + 80%|███████▉ | 5873/7378 [20:08:33<5:08:48, 12.31s/it] + +{'loss': 0.4358, 'learning_rate': 2.104480987182551e-06, 'epoch': 0.8} + + 80%|███████▉ | 5873/7378 [20:08:33<5:08:48, 12.31s/it] + 80%|███████▉ | 5874/7378 [20:08:45<5:09:22, 12.34s/it] + +{'loss': 0.4561, 'learning_rate': 2.1017875831168553e-06, 'epoch': 0.8} + + 80%|███████▉ | 5874/7378 [20:08:45<5:09:22, 12.34s/it] + 80%|███████▉ | 5875/7378 [20:08:58<5:08:10, 12.30s/it] + +{'loss': 0.4251, 'learning_rate': 2.099095701307824e-06, 'epoch': 0.8} + + 80%|███████▉ | 5875/7378 [20:08:58<5:08:10, 12.30s/it] + 80%|███████▉ | 5876/7378 [20:09:10<5:07:01, 12.26s/it] + +{'loss': 0.4223, 'learning_rate': 2.0964053422742736e-06, 'epoch': 0.8} + + 80%|███████▉ | 5876/7378 [20:09:10<5:07:01, 12.26s/it] + 80%|███████▉ | 5877/7378 [20:09:22<5:08:51, 12.35s/it] + +{'loss': 0.4319, 'learning_rate': 2.093716506534733e-06, 'epoch': 0.8} + + 80%|███████▉ | 5877/7378 [20:09:22<5:08:51, 12.35s/it] + 80%|███████▉ | 5878/7378 [20:09:35<5:08:15, 12.33s/it] + +{'loss': 0.5216, 'learning_rate': 2.0910291946074312e-06, 'epoch': 0.8} + + 80%|███████▉ | 5878/7378 [20:09:35<5:08:15, 12.33s/it] + 80%|███████▉ | 5879/7378 [20:09:47<5:05:35, 12.23s/it] + +{'loss': 0.4664, 'learning_rate': 2.088343407010306e-06, 'epoch': 0.8} + + 80%|███████▉ | 5879/7378 [20:09:47<5:05:35, 12.23s/it] + 80%|███████▉ | 5880/7378 [20:09:59<5:05:01, 12.22s/it] + +{'loss': 0.4497, 'learning_rate': 2.0856591442609965e-06, 'epoch': 0.8} + + 80%|███████▉ | 5880/7378 [20:09:59<5:05:01, 12.22s/it] + 80%|███████▉ | 5881/7378 [20:10:11<5:03:06, 12.15s/it] + +{'loss': 0.4233, 'learning_rate': 2.0829764068768586e-06, 'epoch': 0.8} + + 80%|███████▉ | 5881/7378 [20:10:11<5:03:06, 12.15s/it] + 80%|███████▉ | 5882/7378 [20:10:23<5:04:31, 12.21s/it] + +{'loss': 0.4047, 'learning_rate': 2.080295195374945e-06, 'epoch': 0.8} + + 80%|███████▉ | 5882/7378 [20:10:23<5:04:31, 12.21s/it] + 80%|███████▉ | 5883/7378 [20:10:35<5:02:56, 12.16s/it] + +{'loss': 0.3991, 'learning_rate': 2.0776155102720174e-06, 'epoch': 0.8} + + 80%|███████▉ | 5883/7378 [20:10:35<5:02:56, 12.16s/it] + 80%|███████▉ | 5884/7378 [20:10:47<5:00:23, 12.06s/it] + +{'loss': 0.3982, 'learning_rate': 2.074937352084543e-06, 'epoch': 0.8} + + 80%|███████▉ | 5884/7378 [20:10:47<5:00:23, 12.06s/it] + 80%|███████▉ | 5885/7378 [20:10:59<5:00:49, 12.09s/it] + +{'loss': 0.4415, 'learning_rate': 2.0722607213286917e-06, 'epoch': 0.8} + + 80%|███████▉ | 5885/7378 [20:10:59<5:00:49, 12.09s/it] + 80%|███████▉ | 5886/7378 [20:11:11<5:01:22, 12.12s/it] + +{'loss': 0.4246, 'learning_rate': 2.069585618520349e-06, 'epoch': 0.8} + + 80%|███████▉ | 5886/7378 [20:11:11<5:01:22, 12.12s/it] + 80%|███████▉ | 5887/7378 [20:11:24<5:05:06, 12.28s/it] + +{'loss': 0.4167, 'learning_rate': 2.0669120441750945e-06, 'epoch': 0.8} + + 80%|███████▉ | 5887/7378 [20:11:24<5:05:06, 12.28s/it] + 80%|███████▉ | 5888/7378 [20:11:36<5:06:27, 12.34s/it] + +{'loss': 0.4689, 'learning_rate': 2.0642399988082186e-06, 'epoch': 0.8} + + 80%|███████▉ | 5888/7378 [20:11:36<5:06:27, 12.34s/it] + 80%|███████▉ | 5889/7378 [20:11:48<5:02:52, 12.20s/it] + +{'loss': 0.428, 'learning_rate': 2.061569482934713e-06, 'epoch': 0.8} + + 80%|███████▉ | 5889/7378 [20:11:48<5:02:52, 12.20s/it] + 80%|███████▉ | 5890/7378 [20:12:01<5:03:13, 12.23s/it] + +{'loss': 0.387, 'learning_rate': 2.058900497069284e-06, 'epoch': 0.8} + + 80%|███████▉ | 5890/7378 [20:12:01<5:03:13, 12.23s/it] + 80%|███████▉ | 5891/7378 [20:12:13<5:05:11, 12.31s/it] + +{'loss': 0.4831, 'learning_rate': 2.0562330417263344e-06, 'epoch': 0.8} + + 80%|███████▉ | 5891/7378 [20:12:13<5:05:11, 12.31s/it] + 80%|███████▉ | 5892/7378 [20:12:25<5:01:30, 12.17s/it] + +{'loss': 0.4088, 'learning_rate': 2.0535671174199722e-06, 'epoch': 0.8} + + 80%|███████▉ | 5892/7378 [20:12:25<5:01:30, 12.17s/it] + 80%|███████▉ | 5893/7378 [20:12:37<5:00:12, 12.13s/it] + +{'loss': 0.4187, 'learning_rate': 2.0509027246640157e-06, 'epoch': 0.8} + + 80%|███████▉ | 5893/7378 [20:12:37<5:00:12, 12.13s/it] + 80%|███████▉ | 5894/7378 [20:12:49<4:59:58, 12.13s/it] + +{'loss': 0.4312, 'learning_rate': 2.048239863971979e-06, 'epoch': 0.8} + + 80%|███████▉ | 5894/7378 [20:12:49<4:59:58, 12.13s/it] + 80%|███████▉ | 5895/7378 [20:13:01<4:56:51, 12.01s/it] + +{'loss': 0.402, 'learning_rate': 2.0455785358570945e-06, 'epoch': 0.8} + + 80%|███████▉ | 5895/7378 [20:13:01<4:56:51, 12.01s/it] + 80%|███████▉ | 5896/7378 [20:13:13<4:59:09, 12.11s/it] + +{'loss': 0.4306, 'learning_rate': 2.042918740832288e-06, 'epoch': 0.8} + + 80%|███████▉ | 5896/7378 [20:13:13<4:59:09, 12.11s/it] + 80%|███████▉ | 5897/7378 [20:13:26<5:01:52, 12.23s/it] + +{'loss': 0.4966, 'learning_rate': 2.040260479410193e-06, 'epoch': 0.8} + + 80%|███████▉ | 5897/7378 [20:13:26<5:01:52, 12.23s/it] + 80%|███████▉ | 5898/7378 [20:13:38<5:00:40, 12.19s/it] + +{'loss': 0.4438, 'learning_rate': 2.0376037521031456e-06, 'epoch': 0.8} + + 80%|███████▉ | 5898/7378 [20:13:38<5:00:40, 12.19s/it] + 80%|███████▉ | 5899/7378 [20:13:50<5:04:01, 12.33s/it] + +{'loss': 0.448, 'learning_rate': 2.034948559423193e-06, 'epoch': 0.8} + + 80%|███████▉ | 5899/7378 [20:13:50<5:04:01, 12.33s/it] + 80%|███████▉ | 5900/7378 [20:14:03<5:01:43, 12.25s/it] + +{'loss': 0.4555, 'learning_rate': 2.0322949018820802e-06, 'epoch': 0.8} + + 80%|███████▉ | 5900/7378 [20:14:03<5:01:43, 12.25s/it] + 80%|███████▉ | 5901/7378 [20:14:15<5:04:58, 12.39s/it] + +{'loss': 0.4322, 'learning_rate': 2.0296427799912575e-06, 'epoch': 0.8} + + 80%|███████▉ | 5901/7378 [20:14:15<5:04:58, 12.39s/it] + 80%|███████▉ | 5902/7378 [20:14:28<5:05:20, 12.41s/it] + +{'loss': 0.417, 'learning_rate': 2.02699219426188e-06, 'epoch': 0.8} + + 80%|███████▉ | 5902/7378 [20:14:28<5:05:20, 12.41s/it] + 80%|████████ | 5903/7378 [20:14:40<5:04:19, 12.38s/it] + +{'loss': 0.4049, 'learning_rate': 2.0243431452048036e-06, 'epoch': 0.8} + + 80%|████████ | 5903/7378 [20:14:40<5:04:19, 12.38s/it] + 80%|████████ | 5904/7378 [20:14:53<5:05:43, 12.44s/it] + +{'loss': 0.4319, 'learning_rate': 2.021695633330596e-06, 'epoch': 0.8} + + 80%|████████ | 5904/7378 [20:14:53<5:05:43, 12.44s/it] + 80%|████████ | 5905/7378 [20:15:05<5:07:23, 12.52s/it] + +{'loss': 0.4398, 'learning_rate': 2.0190496591495224e-06, 'epoch': 0.8} + + 80%|████████ | 5905/7378 [20:15:05<5:07:23, 12.52s/it] + 80%|████████ | 5906/7378 [20:15:18<5:09:56, 12.63s/it] + +{'loss': 0.4917, 'learning_rate': 2.0164052231715516e-06, 'epoch': 0.8} + + 80%|████████ | 5906/7378 [20:15:18<5:09:56, 12.63s/it] + 80%|████████ | 5907/7378 [20:15:30<5:06:19, 12.49s/it] + +{'loss': 0.4157, 'learning_rate': 2.0137623259063576e-06, 'epoch': 0.8} + + 80%|████████ | 5907/7378 [20:15:30<5:06:19, 12.49s/it] + 80%|████████ | 5908/7378 [20:15:42<5:02:29, 12.35s/it] + +{'loss': 0.4297, 'learning_rate': 2.0111209678633147e-06, 'epoch': 0.8} + + 80%|████████ | 5908/7378 [20:15:42<5:02:29, 12.35s/it] + 80%|████████ | 5909/7378 [20:15:55<5:03:11, 12.38s/it] + +{'loss': 0.4535, 'learning_rate': 2.0084811495515087e-06, 'epoch': 0.8} + + 80%|████████ | 5909/7378 [20:15:55<5:03:11, 12.38s/it] + 80%|████████ | 5910/7378 [20:16:07<5:01:41, 12.33s/it] + +{'loss': 0.4421, 'learning_rate': 2.0058428714797206e-06, 'epoch': 0.8} + + 80%|████████ | 5910/7378 [20:16:07<5:01:41, 12.33s/it] + 80%|████████ | 5911/7378 [20:16:20<5:02:56, 12.39s/it] + +{'loss': 0.4234, 'learning_rate': 2.003206134156437e-06, 'epoch': 0.8} + + 80%|████████ | 5911/7378 [20:16:20<5:02:56, 12.39s/it] + 80%|████████ | 5912/7378 [20:16:32<5:04:22, 12.46s/it] + +{'loss': 0.4492, 'learning_rate': 2.0005709380898454e-06, 'epoch': 0.8} + + 80%|████████ | 5912/7378 [20:16:32<5:04:22, 12.46s/it] + 80%|████████ | 5913/7378 [20:16:44<5:01:34, 12.35s/it] + +{'loss': 0.4227, 'learning_rate': 1.997937283787843e-06, 'epoch': 0.8} + + 80%|████████ | 5913/7378 [20:16:44<5:01:34, 12.35s/it] + 80%|████████ | 5914/7378 [20:16:57<5:00:54, 12.33s/it] + +{'loss': 0.3407, 'learning_rate': 1.995305171758023e-06, 'epoch': 0.8} + + 80%|████████ | 5914/7378 [20:16:57<5:00:54, 12.33s/it] + 80%|████████ | 5915/7378 [20:17:09<4:59:37, 12.29s/it] + +{'loss': 0.3985, 'learning_rate': 1.992674602507685e-06, 'epoch': 0.8} + + 80%|████████ | 5915/7378 [20:17:09<4:59:37, 12.29s/it] + 80%|████████ | 5916/7378 [20:17:21<4:57:50, 12.22s/it] + +{'loss': 0.3654, 'learning_rate': 1.9900455765438288e-06, 'epoch': 0.8} + + 80%|████████ | 5916/7378 [20:17:21<4:57:50, 12.22s/it] + 80%|████████ | 5917/7378 [20:17:33<4:59:16, 12.29s/it] + +{'loss': 0.4181, 'learning_rate': 1.987418094373155e-06, 'epoch': 0.8} + + 80%|████████ | 5917/7378 [20:17:33<4:59:16, 12.29s/it] + 80%|████████ | 5918/7378 [20:17:46<4:59:53, 12.32s/it] + +{'loss': 0.4108, 'learning_rate': 1.9847921565020724e-06, 'epoch': 0.8} + + 80%|████████ | 5918/7378 [20:17:46<4:59:53, 12.32s/it] + 80%|████████ | 5919/7378 [20:17:58<5:00:10, 12.34s/it] + +{'loss': 0.3926, 'learning_rate': 1.9821677634366932e-06, 'epoch': 0.8} + + 80%|████████ | 5919/7378 [20:17:58<5:00:10, 12.34s/it] + 80%|████████ | 5920/7378 [20:18:10<4:59:58, 12.34s/it] + +{'loss': 0.4721, 'learning_rate': 1.979544915682824e-06, 'epoch': 0.8} + + 80%|████████ | 5920/7378 [20:18:10<4:59:58, 12.34s/it] + 80%|████████ | 5921/7378 [20:18:23<4:58:45, 12.30s/it] + +{'loss': 0.3733, 'learning_rate': 1.9769236137459778e-06, 'epoch': 0.8} + + 80%|████████ | 5921/7378 [20:18:23<4:58:45, 12.30s/it] + 80%|████████ | 5922/7378 [20:18:35<4:58:47, 12.31s/it] + +{'loss': 0.4082, 'learning_rate': 1.974303858131369e-06, 'epoch': 0.8} + + 80%|████████ | 5922/7378 [20:18:35<4:58:47, 12.31s/it] + 80%|████████ | 5923/7378 [20:18:47<4:58:47, 12.32s/it] + +{'loss': 0.4435, 'learning_rate': 1.971685649343916e-06, 'epoch': 0.8} + + 80%|████████ | 5923/7378 [20:18:47<4:58:47, 12.32s/it] + 80%|████████ | 5924/7378 [20:19:00<5:01:09, 12.43s/it] + +{'loss': 0.4482, 'learning_rate': 1.9690689878882375e-06, 'epoch': 0.8} + + 80%|████████ | 5924/7378 [20:19:00<5:01:09, 12.43s/it] + 80%|████████ | 5925/7378 [20:19:12<5:00:28, 12.41s/it] + +{'loss': 0.4197, 'learning_rate': 1.9664538742686533e-06, 'epoch': 0.8} + + 80%|████████ | 5925/7378 [20:19:12<5:00:28, 12.41s/it] + 80%|████████ | 5926/7378 [20:19:25<4:58:35, 12.34s/it] + +{'loss': 0.4102, 'learning_rate': 1.9638403089891857e-06, 'epoch': 0.8} + + 80%|████████ | 5926/7378 [20:19:25<4:58:35, 12.34s/it] + 80%|████████ | 5927/7378 [20:19:37<4:57:32, 12.30s/it] + +{'loss': 0.4884, 'learning_rate': 1.961228292553555e-06, 'epoch': 0.8} + + 80%|████████ | 5927/7378 [20:19:37<4:57:32, 12.30s/it] + 80%|████████ | 5928/7378 [20:19:49<4:56:00, 12.25s/it] + +{'loss': 0.3555, 'learning_rate': 1.9586178254651934e-06, 'epoch': 0.8} + + 80%|████████ | 5928/7378 [20:19:49<4:56:00, 12.25s/it] + 80%|████████ | 5929/7378 [20:20:01<4:56:33, 12.28s/it] + +{'loss': 0.4679, 'learning_rate': 1.956008908227224e-06, 'epoch': 0.8} + + 80%|████████ | 5929/7378 [20:20:01<4:56:33, 12.28s/it] + 80%|████████ | 5930/7378 [20:20:13<4:56:11, 12.27s/it] + +{'loss': 0.4402, 'learning_rate': 1.9534015413424734e-06, 'epoch': 0.8} + + 80%|████████ | 5930/7378 [20:20:14<4:56:11, 12.27s/it] + 80%|████████ | 5931/7378 [20:20:26<4:55:02, 12.23s/it] + +{'loss': 0.4667, 'learning_rate': 1.95079572531347e-06, 'epoch': 0.8} + + 80%|████████ | 5931/7378 [20:20:26<4:55:02, 12.23s/it] + 80%|████████ | 5932/7378 [20:20:38<4:55:19, 12.25s/it] + +{'loss': 0.4583, 'learning_rate': 1.948191460642448e-06, 'epoch': 0.8} + + 80%|████████ | 5932/7378 [20:20:38<4:55:19, 12.25s/it] + 80%|████████ | 5933/7378 [20:20:50<4:52:58, 12.17s/it] + +{'loss': 0.4798, 'learning_rate': 1.9455887478313374e-06, 'epoch': 0.8} + + 80%|████████ | 5933/7378 [20:20:50<4:52:58, 12.17s/it] + 80%|████████ | 5934/7378 [20:21:02<4:52:32, 12.16s/it] + +{'loss': 0.3843, 'learning_rate': 1.942987587381768e-06, 'epoch': 0.8} + + 80%|████████ | 5934/7378 [20:21:02<4:52:32, 12.16s/it] + 80%|████████ | 5935/7378 [20:21:14<4:51:58, 12.14s/it] + +{'loss': 0.4604, 'learning_rate': 1.9403879797950753e-06, 'epoch': 0.8} + + 80%|████████ | 5935/7378 [20:21:14<4:51:58, 12.14s/it] + 80%|████████ | 5936/7378 [20:21:26<4:52:32, 12.17s/it] + +{'loss': 0.4724, 'learning_rate': 1.9377899255722886e-06, 'epoch': 0.8} + + 80%|████████ | 5936/7378 [20:21:26<4:52:32, 12.17s/it] + 80%|████████ | 5937/7378 [20:21:39<4:52:05, 12.16s/it] + +{'loss': 0.405, 'learning_rate': 1.935193425214148e-06, 'epoch': 0.8} + + 80%|████████ | 5937/7378 [20:21:39<4:52:05, 12.16s/it] + 80%|████████ | 5938/7378 [20:21:51<4:56:01, 12.33s/it] + +{'loss': 0.4185, 'learning_rate': 1.932598479221085e-06, 'epoch': 0.8} + + 80%|████████ | 5938/7378 [20:21:51<4:56:01, 12.33s/it] + 80%|████████ | 5939/7378 [20:22:04<4:59:48, 12.50s/it] + +{'loss': 0.4656, 'learning_rate': 1.9300050880932354e-06, 'epoch': 0.8} + + 80%|████████ | 5939/7378 [20:22:04<4:59:48, 12.50s/it] + 81%|████████ | 5940/7378 [20:22:16<4:55:47, 12.34s/it] + +{'loss': 0.406, 'learning_rate': 1.9274132523304324e-06, 'epoch': 0.81} + + 81%|████████ | 5940/7378 [20:22:16<4:55:47, 12.34s/it] + 81%|████████ | 5941/7378 [20:22:28<4:54:28, 12.30s/it] + +{'loss': 0.4912, 'learning_rate': 1.9248229724322164e-06, 'epoch': 0.81} + + 81%|████████ | 5941/7378 [20:22:28<4:54:28, 12.30s/it] + 81%|████████ | 5942/7378 [20:22:41<4:54:43, 12.31s/it] + +{'loss': 0.4263, 'learning_rate': 1.9222342488978208e-06, 'epoch': 0.81} + + 81%|████████ | 5942/7378 [20:22:41<4:54:43, 12.31s/it] + 81%|████████ | 5943/7378 [20:22:53<4:57:49, 12.45s/it] + +{'loss': 0.4445, 'learning_rate': 1.9196470822261816e-06, 'epoch': 0.81} + + 81%|████████ | 5943/7378 [20:22:53<4:57:49, 12.45s/it] + 81%|████████ | 5944/7378 [20:23:06<4:55:18, 12.36s/it] + +{'loss': 0.4781, 'learning_rate': 1.917061472915933e-06, 'epoch': 0.81} + + 81%|████████ | 5944/7378 [20:23:06<4:55:18, 12.36s/it] + 81%|████████ | 5945/7378 [20:23:18<4:55:02, 12.35s/it] + +{'loss': 0.4237, 'learning_rate': 1.9144774214654118e-06, 'epoch': 0.81} + + 81%|████████ | 5945/7378 [20:23:18<4:55:02, 12.35s/it] + 81%|████████ | 5946/7378 [20:23:30<4:53:55, 12.32s/it] + +{'loss': 0.4469, 'learning_rate': 1.911894928372655e-06, 'epoch': 0.81} + + 81%|████████ | 5946/7378 [20:23:30<4:53:55, 12.32s/it] + 81%|████████ | 5947/7378 [20:23:43<4:55:17, 12.38s/it] + +{'loss': 0.5131, 'learning_rate': 1.9093139941353968e-06, 'epoch': 0.81} + + 81%|████████ | 5947/7378 [20:23:43<4:55:17, 12.38s/it] + 81%|████████ | 5948/7378 [20:23:55<4:52:40, 12.28s/it] + +{'loss': 0.3975, 'learning_rate': 1.9067346192510705e-06, 'epoch': 0.81} + + 81%|████████ | 5948/7378 [20:23:55<4:52:40, 12.28s/it] + 81%|████████ | 5949/7378 [20:24:07<4:53:18, 12.32s/it] + +{'loss': 0.4309, 'learning_rate': 1.90415680421681e-06, 'epoch': 0.81} + + 81%|████████ | 5949/7378 [20:24:07<4:53:18, 12.32s/it] + 81%|████████ | 5950/7378 [20:24:19<4:52:17, 12.28s/it] + +{'loss': 0.4602, 'learning_rate': 1.9015805495294515e-06, 'epoch': 0.81} + + 81%|████████ | 5950/7378 [20:24:19<4:52:17, 12.28s/it] + 81%|████████ | 5951/7378 [20:24:32<4:52:04, 12.28s/it] + +{'loss': 0.4163, 'learning_rate': 1.8990058556855274e-06, 'epoch': 0.81} + + 81%|████████ | 5951/7378 [20:24:32<4:52:04, 12.28s/it] + 81%|████████ | 5952/7378 [20:24:44<4:50:49, 12.24s/it] + +{'loss': 0.4508, 'learning_rate': 1.8964327231812674e-06, 'epoch': 0.81} + + 81%|████████ | 5952/7378 [20:24:44<4:50:49, 12.24s/it] + 81%|████████ | 5953/7378 [20:24:56<4:50:02, 12.21s/it] + +{'loss': 0.4932, 'learning_rate': 1.8938611525126026e-06, 'epoch': 0.81} + + 81%|████████ | 5953/7378 [20:24:56<4:50:02, 12.21s/it] + 81%|████████ | 5954/7378 [20:25:08<4:48:41, 12.16s/it] + +{'loss': 0.4609, 'learning_rate': 1.8912911441751625e-06, 'epoch': 0.81} + + 81%|████████ | 5954/7378 [20:25:08<4:48:41, 12.16s/it] + 81%|████████ | 5955/7378 [20:25:20<4:46:29, 12.08s/it] + +{'loss': 0.4232, 'learning_rate': 1.8887226986642792e-06, 'epoch': 0.81} + + 81%|████████ | 5955/7378 [20:25:20<4:46:29, 12.08s/it] + 81%|████████ | 5956/7378 [20:25:32<4:49:50, 12.23s/it] + +{'loss': 0.4234, 'learning_rate': 1.8861558164749782e-06, 'epoch': 0.81} + + 81%|████████ | 5956/7378 [20:25:32<4:49:50, 12.23s/it] + 81%|████████ | 5957/7378 [20:25:45<4:49:59, 12.24s/it] + +{'loss': 0.4732, 'learning_rate': 1.8835904981019858e-06, 'epoch': 0.81} + + 81%|████████ | 5957/7378 [20:25:45<4:49:59, 12.24s/it] + 81%|████████ | 5958/7378 [20:25:57<4:51:46, 12.33s/it] + +{'loss': 0.4048, 'learning_rate': 1.8810267440397245e-06, 'epoch': 0.81} + + 81%|████████ | 5958/7378 [20:25:57<4:51:46, 12.33s/it] + 81%|████████ | 5959/7378 [20:26:10<4:53:23, 12.41s/it] + +{'loss': 0.4181, 'learning_rate': 1.8784645547823233e-06, 'epoch': 0.81} + + 81%|████████ | 5959/7378 [20:26:10<4:53:23, 12.41s/it] + 81%|████████ | 5960/7378 [20:26:22<4:51:28, 12.33s/it] + +{'loss': 0.4695, 'learning_rate': 1.8759039308235972e-06, 'epoch': 0.81} + + 81%|████████ | 5960/7378 [20:26:22<4:51:28, 12.33s/it] + 81%|████████ | 5961/7378 [20:26:35<4:52:46, 12.40s/it] + +{'loss': 0.4441, 'learning_rate': 1.8733448726570736e-06, 'epoch': 0.81} + + 81%|████████ | 5961/7378 [20:26:35<4:52:46, 12.40s/it] + 81%|████████ | 5962/7378 [20:26:47<4:50:33, 12.31s/it] + +{'loss': 0.4403, 'learning_rate': 1.8707873807759668e-06, 'epoch': 0.81} + + 81%|████████ | 5962/7378 [20:26:47<4:50:33, 12.31s/it] + 81%|████████ | 5963/7378 [20:26:59<4:49:16, 12.27s/it] + +{'loss': 0.3893, 'learning_rate': 1.868231455673194e-06, 'epoch': 0.81} + + 81%|████████ | 5963/7378 [20:26:59<4:49:16, 12.27s/it] + 81%|████████ | 5964/7378 [20:27:11<4:50:17, 12.32s/it] + +{'loss': 0.4375, 'learning_rate': 1.8656770978413674e-06, 'epoch': 0.81} + + 81%|████████ | 5964/7378 [20:27:11<4:50:17, 12.32s/it] + 81%|████████ | 5965/7378 [20:27:24<4:53:23, 12.46s/it] + +{'loss': 0.4176, 'learning_rate': 1.8631243077728045e-06, 'epoch': 0.81} + + 81%|████████ | 5965/7378 [20:27:24<4:53:23, 12.46s/it] + 81%|████████ | 5966/7378 [20:27:36<4:52:04, 12.41s/it] + +{'loss': 0.4854, 'learning_rate': 1.8605730859595116e-06, 'epoch': 0.81} + + 81%|████████ | 5966/7378 [20:27:36<4:52:04, 12.41s/it] + 81%|████████ | 5967/7378 [20:27:49<4:52:09, 12.42s/it] + +{'loss': 0.4328, 'learning_rate': 1.8580234328931979e-06, 'epoch': 0.81} + + 81%|████████ | 5967/7378 [20:27:49<4:52:09, 12.42s/it] + 81%|████████ | 5968/7378 [20:28:01<4:50:59, 12.38s/it] + +{'loss': 0.4365, 'learning_rate': 1.8554753490652689e-06, 'epoch': 0.81} + + 81%|████████ | 5968/7378 [20:28:01<4:50:59, 12.38s/it] + 81%|████████ | 5969/7378 [20:28:13<4:49:10, 12.31s/it] + +{'loss': 0.4148, 'learning_rate': 1.8529288349668261e-06, 'epoch': 0.81} + + 81%|████████ | 5969/7378 [20:28:13<4:49:10, 12.31s/it] + 81%|████████ | 5970/7378 [20:28:25<4:47:23, 12.25s/it] + +{'loss': 0.4207, 'learning_rate': 1.850383891088674e-06, 'epoch': 0.81} + + 81%|████████ | 5970/7378 [20:28:25<4:47:23, 12.25s/it] + 81%|████████ | 5971/7378 [20:28:38<4:51:38, 12.44s/it] + +{'loss': 0.4246, 'learning_rate': 1.8478405179213076e-06, 'epoch': 0.81} + + 81%|████████ | 5971/7378 [20:28:38<4:51:38, 12.44s/it] + 81%|████████ | 5972/7378 [20:28:50<4:49:41, 12.36s/it] + +{'loss': 0.4446, 'learning_rate': 1.845298715954925e-06, 'epoch': 0.81} + + 81%|████████ | 5972/7378 [20:28:50<4:49:41, 12.36s/it] + 81%|��███████ | 5973/7378 [20:29:03<4:49:03, 12.34s/it] + +{'loss': 0.3966, 'learning_rate': 1.8427584856794134e-06, 'epoch': 0.81} + + 81%|████████ | 5973/7378 [20:29:03<4:49:03, 12.34s/it] + 81%|████████ | 5974/7378 [20:29:15<4:50:55, 12.43s/it] + +{'loss': 0.5013, 'learning_rate': 1.8402198275843687e-06, 'epoch': 0.81} + + 81%|████████ | 5974/7378 [20:29:15<4:50:55, 12.43s/it] + 81%|████████ | 5975/7378 [20:29:28<4:52:49, 12.52s/it] + +{'loss': 0.432, 'learning_rate': 1.8376827421590737e-06, 'epoch': 0.81} + + 81%|████████ | 5975/7378 [20:29:28<4:52:49, 12.52s/it] + 81%|████████ | 5976/7378 [20:29:41<4:52:25, 12.51s/it] + +{'loss': 0.4902, 'learning_rate': 1.8351472298925143e-06, 'epoch': 0.81} + + 81%|████████ | 5976/7378 [20:29:41<4:52:25, 12.51s/it] + 81%|████████ | 5977/7378 [20:29:53<4:50:16, 12.43s/it] + +{'loss': 0.4133, 'learning_rate': 1.8326132912733685e-06, 'epoch': 0.81} + + 81%|████████ | 5977/7378 [20:29:53<4:50:16, 12.43s/it] + 81%|████████ | 5978/7378 [20:30:05<4:48:39, 12.37s/it] + +{'loss': 0.4206, 'learning_rate': 1.830080926790011e-06, 'epoch': 0.81} + + 81%|████████ | 5978/7378 [20:30:05<4:48:39, 12.37s/it] + 81%|████████ | 5979/7378 [20:30:17<4:48:07, 12.36s/it] + +{'loss': 0.4302, 'learning_rate': 1.8275501369305214e-06, 'epoch': 0.81} + + 81%|████████ | 5979/7378 [20:30:17<4:48:07, 12.36s/it] + 81%|████████ | 5980/7378 [20:30:30<4:47:42, 12.35s/it] + +{'loss': 0.4774, 'learning_rate': 1.8250209221826675e-06, 'epoch': 0.81} + + 81%|████████ | 5980/7378 [20:30:30<4:47:42, 12.35s/it] + 81%|████████ | 5981/7378 [20:30:42<4:47:50, 12.36s/it] + +{'loss': 0.4039, 'learning_rate': 1.8224932830339137e-06, 'epoch': 0.81} + + 81%|████████ | 5981/7378 [20:30:42<4:47:50, 12.36s/it] + 81%|████████ | 5982/7378 [20:30:54<4:45:50, 12.29s/it] + +{'loss': 0.4418, 'learning_rate': 1.8199672199714224e-06, 'epoch': 0.81} + + 81%|████████ | 5982/7378 [20:30:54<4:45:50, 12.29s/it] + 81%|████████ | 5983/7378 [20:31:06<4:45:40, 12.29s/it] + +{'loss': 0.4446, 'learning_rate': 1.8174427334820565e-06, 'epoch': 0.81} + + 81%|████████ | 5983/7378 [20:31:06<4:45:40, 12.29s/it] + 81%|████████ | 5984/7378 [20:31:18<4:43:48, 12.22s/it] + +{'loss': 0.4438, 'learning_rate': 1.8149198240523702e-06, 'epoch': 0.81} + + 81%|████████ | 5984/7378 [20:31:18<4:43:48, 12.22s/it] + 81%|████████ | 5985/7378 [20:31:31<4:48:31, 12.43s/it] + +{'loss': 0.4629, 'learning_rate': 1.8123984921686134e-06, 'epoch': 0.81} + + 81%|████████ | 5985/7378 [20:31:31<4:48:31, 12.43s/it] + 81%|████████ | 5986/7378 [20:31:43<4:45:30, 12.31s/it] + +{'loss': 0.3795, 'learning_rate': 1.8098787383167327e-06, 'epoch': 0.81} + + 81%|████████ | 5986/7378 [20:31:43<4:45:30, 12.31s/it] + 81%|████████ | 5987/7378 [20:31:56<4:46:33, 12.36s/it] + +{'loss': 0.4385, 'learning_rate': 1.807360562982371e-06, 'epoch': 0.81} + + 81%|████████ | 5987/7378 [20:31:56<4:46:33, 12.36s/it] + 81%|████████ | 5988/7378 [20:32:08<4:46:37, 12.37s/it] + +{'loss': 0.4331, 'learning_rate': 1.80484396665087e-06, 'epoch': 0.81} + + 81%|████████ | 5988/7378 [20:32:08<4:46:37, 12.37s/it] + 81%|████████ | 5989/7378 [20:32:21<4:49:01, 12.48s/it] + +{'loss': 0.4156, 'learning_rate': 1.8023289498072626e-06, 'epoch': 0.81} + + 81%|████████ | 5989/7378 [20:32:21<4:49:01, 12.48s/it] + 81%|████████ | 5990/7378 [20:32:33<4:48:01, 12.45s/it] + +{'loss': 0.5036, 'learning_rate': 1.7998155129362783e-06, 'epoch': 0.81} + + 81%|████████ | 5990/7378 [20:32:33<4:48:01, 12.45s/it] + 81%|████████ | 5991/7378 [20:32:46<4:46:23, 12.39s/it] + +{'loss': 0.3899, 'learning_rate': 1.7973036565223411e-06, 'epoch': 0.81} + + 81%|████████ | 5991/7378 [20:32:46<4:46:23, 12.39s/it] + 81%|████████ | 5992/7378 [20:32:58<4:43:16, 12.26s/it] + +{'loss': 0.4157, 'learning_rate': 1.7947933810495755e-06, 'epoch': 0.81} + + 81%|████████ | 5992/7378 [20:32:58<4:43:16, 12.26s/it] + 81%|████████ | 5993/7378 [20:33:10<4:42:47, 12.25s/it] + +{'loss': 0.4676, 'learning_rate': 1.7922846870017974e-06, 'epoch': 0.81} + + 81%|████████ | 5993/7378 [20:33:10<4:42:47, 12.25s/it] + 81%|████████ | 5994/7378 [20:33:22<4:39:22, 12.11s/it] + +{'loss': 0.4562, 'learning_rate': 1.7897775748625169e-06, 'epoch': 0.81} + + 81%|████████ | 5994/7378 [20:33:22<4:39:22, 12.11s/it] + 81%|████████▏ | 5995/7378 [20:33:34<4:39:20, 12.12s/it] + +{'loss': 0.3944, 'learning_rate': 1.7872720451149406e-06, 'epoch': 0.81} + + 81%|████████▏ | 5995/7378 [20:33:34<4:39:20, 12.12s/it] + 81%|████████▏ | 5996/7378 [20:33:46<4:37:39, 12.05s/it] + +{'loss': 0.4159, 'learning_rate': 1.7847680982419668e-06, 'epoch': 0.81} + + 81%|████████▏ | 5996/7378 [20:33:46<4:37:39, 12.05s/it] + 81%|████████▏ | 5997/7378 [20:33:58<4:37:32, 12.06s/it] + +{'loss': 0.4015, 'learning_rate': 1.7822657347261985e-06, 'epoch': 0.81} + + 81%|████████▏ | 5997/7378 [20:33:58<4:37:32, 12.06s/it] + 81%|████████▏ | 5998/7378 [20:34:10<4:39:54, 12.17s/it] + +{'loss': 0.4102, 'learning_rate': 1.779764955049925e-06, 'epoch': 0.81} + + 81%|████████▏ | 5998/7378 [20:34:10<4:39:54, 12.17s/it] + 81%|████████▏ | 5999/7378 [20:34:23<4:41:08, 12.23s/it] + +{'loss': 0.4562, 'learning_rate': 1.7772657596951304e-06, 'epoch': 0.81} + + 81%|████████▏ | 5999/7378 [20:34:23<4:41:08, 12.23s/it] + 81%|████████▏ | 6000/7378 [20:34:35<4:41:51, 12.27s/it] + +{'loss': 0.456, 'learning_rate': 1.7747681491434943e-06, 'epoch': 0.81} + + 81%|████████▏ | 6000/7378 [20:34:35<4:41:51, 12.27s/it] + 81%|████████▏ | 6001/7378 [20:34:47<4:42:18, 12.30s/it] + +{'loss': 0.4661, 'learning_rate': 1.7722721238763963e-06, 'epoch': 0.81} + + 81%|████████▏ | 6001/7378 [20:34:47<4:42:18, 12.30s/it] + 81%|████████▏ | 6002/7378 [20:35:00<4:41:31, 12.28s/it] + +{'loss': 0.4369, 'learning_rate': 1.7697776843749037e-06, 'epoch': 0.81} + + 81%|████████▏ | 6002/7378 [20:35:00<4:41:31, 12.28s/it] + 81%|████████▏ | 6003/7378 [20:35:12<4:40:14, 12.23s/it] + +{'loss': 0.4061, 'learning_rate': 1.767284831119782e-06, 'epoch': 0.81} + + 81%|████████▏ | 6003/7378 [20:35:12<4:40:14, 12.23s/it] + 81%|████████▏ | 6004/7378 [20:35:24<4:39:03, 12.19s/it] + +{'loss': 0.4868, 'learning_rate': 1.7647935645914848e-06, 'epoch': 0.81} + + 81%|████████▏ | 6004/7378 [20:35:24<4:39:03, 12.19s/it] + 81%|████████▏ | 6005/7378 [20:35:36<4:39:43, 12.22s/it] + +{'loss': 0.4451, 'learning_rate': 1.7623038852701724e-06, 'epoch': 0.81} + + 81%|████████▏ | 6005/7378 [20:35:36<4:39:43, 12.22s/it] + 81%|████████▏ | 6006/7378 [20:35:48<4:36:55, 12.11s/it] + +{'loss': 0.386, 'learning_rate': 1.759815793635683e-06, 'epoch': 0.81} + + 81%|████████▏ | 6006/7378 [20:35:48<4:36:55, 12.11s/it] + 81%|████████▏ | 6007/7378 [20:36:00<4:36:49, 12.12s/it] + +{'loss': 0.485, 'learning_rate': 1.7573292901675654e-06, 'epoch': 0.81} + + 81%|████████▏ | 6007/7378 [20:36:00<4:36:49, 12.12s/it] + 81%|████████▏ | 6008/7378 [20:36:12<4:37:48, 12.17s/it] + +{'loss': 0.3548, 'learning_rate': 1.7548443753450506e-06, 'epoch': 0.81} + + 81%|████████▏ | 6008/7378 [20:36:12<4:37:48, 12.17s/it] + 81%|████████▏ | 6009/7378 [20:36:25<4:40:42, 12.30s/it] + +{'loss': 0.4038, 'learning_rate': 1.7523610496470667e-06, 'epoch': 0.81} + + 81%|████████▏ | 6009/7378 [20:36:25<4:40:42, 12.30s/it] + 81%|████████▏ | 6010/7378 [20:36:37<4:41:06, 12.33s/it] + +{'loss': 0.4676, 'learning_rate': 1.7498793135522329e-06, 'epoch': 0.81} + + 81%|████████▏ | 6010/7378 [20:36:37<4:41:06, 12.33s/it] + 81%|████████▏ | 6011/7378 [20:36:49<4:39:39, 12.27s/it] + +{'loss': 0.4523, 'learning_rate': 1.7473991675388714e-06, 'epoch': 0.81} + + 81%|████████▏ | 6011/7378 [20:36:49<4:39:39, 12.27s/it] + 81%|████████▏ | 6012/7378 [20:37:01<4:36:49, 12.16s/it] + +{'loss': 0.4705, 'learning_rate': 1.7449206120849881e-06, 'epoch': 0.81} + + 81%|████████▏ | 6012/7378 [20:37:01<4:36:49, 12.16s/it] + 81%|████████▏ | 6013/7378 [20:37:14<4:38:38, 12.25s/it] + +{'loss': 0.4549, 'learning_rate': 1.742443647668285e-06, 'epoch': 0.81} + + 81%|████████▏ | 6013/7378 [20:37:14<4:38:38, 12.25s/it] + 82%|████████▏ | 6014/7378 [20:37:26<4:38:29, 12.25s/it] + +{'loss': 0.4551, 'learning_rate': 1.7399682747661595e-06, 'epoch': 0.82} + + 82%|████████▏ | 6014/7378 [20:37:26<4:38:29, 12.25s/it] + 82%|████████▏ | 6015/7378 [20:37:38<4:37:50, 12.23s/it] + +{'loss': 0.4491, 'learning_rate': 1.7374944938556982e-06, 'epoch': 0.82} + + 82%|████████▏ | 6015/7378 [20:37:38<4:37:50, 12.23s/it] + 82%|████████▏ | 6016/7378 [20:37:51<4:37:49, 12.24s/it] + +{'loss': 0.4288, 'learning_rate': 1.7350223054136871e-06, 'epoch': 0.82} + + 82%|████████▏ | 6016/7378 [20:37:51<4:37:49, 12.24s/it] + 82%|��███████▏ | 6017/7378 [20:38:02<4:34:53, 12.12s/it] + +{'loss': 0.4609, 'learning_rate': 1.7325517099166012e-06, 'epoch': 0.82} + + 82%|████████▏ | 6017/7378 [20:38:02<4:34:53, 12.12s/it] + 82%|████████▏ | 6018/7378 [20:38:15<4:36:19, 12.19s/it] + +{'loss': 0.4405, 'learning_rate': 1.730082707840608e-06, 'epoch': 0.82} + + 82%|████████▏ | 6018/7378 [20:38:15<4:36:19, 12.19s/it] + 82%|████████▏ | 6019/7378 [20:38:27<4:36:47, 12.22s/it] + +{'loss': 0.4387, 'learning_rate': 1.727615299661567e-06, 'epoch': 0.82} + + 82%|████████▏ | 6019/7378 [20:38:27<4:36:47, 12.22s/it] + 82%|████████▏ | 6020/7378 [20:38:39<4:36:43, 12.23s/it] + +{'loss': 0.4122, 'learning_rate': 1.7251494858550366e-06, 'epoch': 0.82} + + 82%|████████▏ | 6020/7378 [20:38:39<4:36:43, 12.23s/it] + 82%|████████▏ | 6021/7378 [20:38:52<4:36:57, 12.25s/it] + +{'loss': 0.4443, 'learning_rate': 1.7226852668962625e-06, 'epoch': 0.82} + + 82%|████████▏ | 6021/7378 [20:38:52<4:36:57, 12.25s/it] + 82%|████████▏ | 6022/7378 [20:39:04<4:37:04, 12.26s/it] + +{'loss': 0.4819, 'learning_rate': 1.7202226432601833e-06, 'epoch': 0.82} + + 82%|████████▏ | 6022/7378 [20:39:04<4:37:04, 12.26s/it] + 82%|████████▏ | 6023/7378 [20:39:16<4:36:53, 12.26s/it] + +{'loss': 0.3873, 'learning_rate': 1.7177616154214316e-06, 'epoch': 0.82} + + 82%|████████▏ | 6023/7378 [20:39:16<4:36:53, 12.26s/it] + 82%|████████▏ | 6024/7378 [20:39:28<4:34:05, 12.15s/it] + +{'loss': 0.4159, 'learning_rate': 1.7153021838543294e-06, 'epoch': 0.82} + + 82%|████████▏ | 6024/7378 [20:39:28<4:34:05, 12.15s/it] + 82%|████████▏ | 6025/7378 [20:39:40<4:32:57, 12.10s/it] + +{'loss': 0.4211, 'learning_rate': 1.7128443490328983e-06, 'epoch': 0.82} + + 82%|████████▏ | 6025/7378 [20:39:40<4:32:57, 12.10s/it] + 82%|████████▏ | 6026/7378 [20:39:52<4:31:41, 12.06s/it] + +{'loss': 0.3701, 'learning_rate': 1.7103881114308451e-06, 'epoch': 0.82} + + 82%|████████▏ | 6026/7378 [20:39:52<4:31:41, 12.06s/it] + 82%|████████▏ | 6027/7378 [20:40:04<4:33:46, 12.16s/it] + +{'loss': 0.4001, 'learning_rate': 1.7079334715215724e-06, 'epoch': 0.82} + + 82%|████████▏ | 6027/7378 [20:40:04<4:33:46, 12.16s/it] + 82%|████████▏ | 6028/7378 [20:40:16<4:33:49, 12.17s/it] + +{'loss': 0.4275, 'learning_rate': 1.7054804297781714e-06, 'epoch': 0.82} + + 82%|████████▏ | 6028/7378 [20:40:17<4:33:49, 12.17s/it] + 82%|████████▏ | 6029/7378 [20:40:29<4:33:53, 12.18s/it] + +{'loss': 0.3716, 'learning_rate': 1.703028986673425e-06, 'epoch': 0.82} + + 82%|████████▏ | 6029/7378 [20:40:29<4:33:53, 12.18s/it] + 82%|████████▏ | 6030/7378 [20:40:41<4:32:48, 12.14s/it] + +{'loss': 0.4006, 'learning_rate': 1.7005791426798168e-06, 'epoch': 0.82} + + 82%|████████▏ | 6030/7378 [20:40:41<4:32:48, 12.14s/it] + 82%|████████▏ | 6031/7378 [20:40:54<4:37:29, 12.36s/it] + +{'loss': 0.4661, 'learning_rate': 1.6981308982695133e-06, 'epoch': 0.82} + + 82%|████████▏ | 6031/7378 [20:40:54<4:37:29, 12.36s/it] + 82%|████████▏ | 6032/7378 [20:41:06<4:35:46, 12.29s/it] + +{'loss': 0.4242, 'learning_rate': 1.6956842539143747e-06, 'epoch': 0.82} + + 82%|████████▏ | 6032/7378 [20:41:06<4:35:46, 12.29s/it] + 82%|████████▏ | 6033/7378 [20:41:18<4:38:07, 12.41s/it] + +{'loss': 0.4435, 'learning_rate': 1.6932392100859506e-06, 'epoch': 0.82} + + 82%|████████▏ | 6033/7378 [20:41:18<4:38:07, 12.41s/it] + 82%|████████▏ | 6034/7378 [20:41:31<4:35:49, 12.31s/it] + +{'loss': 0.4687, 'learning_rate': 1.690795767255491e-06, 'epoch': 0.82} + + 82%|████████▏ | 6034/7378 [20:41:31<4:35:49, 12.31s/it] + 82%|████████▏ | 6035/7378 [20:41:43<4:33:39, 12.23s/it] + +{'loss': 0.5104, 'learning_rate': 1.6883539258939275e-06, 'epoch': 0.82} + + 82%|████████▏ | 6035/7378 [20:41:43<4:33:39, 12.23s/it] + 82%|████████▏ | 6036/7378 [20:41:55<4:33:12, 12.22s/it] + +{'loss': 0.4311, 'learning_rate': 1.6859136864718873e-06, 'epoch': 0.82} + + 82%|████████▏ | 6036/7378 [20:41:55<4:33:12, 12.22s/it] + 82%|████████▏ | 6037/7378 [20:42:07<4:33:46, 12.25s/it] + +{'loss': 0.4326, 'learning_rate': 1.6834750494596874e-06, 'epoch': 0.82} + + 82%|████████▏ | 6037/7378 [20:42:07<4:33:46, 12.25s/it] + 82%|████████▏ | 6038/7378 [20:42:20<4:35:43, 12.35s/it] + +{'loss': 0.4018, 'learning_rate': 1.6810380153273365e-06, 'epoch': 0.82} + + 82%|████████▏ | 6038/7378 [20:42:20<4:35:43, 12.35s/it] + 82%|████████▏ | 6039/7378 [20:42:32<4:35:49, 12.36s/it] + +{'loss': 0.4551, 'learning_rate': 1.6786025845445375e-06, 'epoch': 0.82} + + 82%|████████▏ | 6039/7378 [20:42:32<4:35:49, 12.36s/it] + 82%|████████▏ | 6040/7378 [20:42:44<4:35:28, 12.35s/it] + +{'loss': 0.4651, 'learning_rate': 1.6761687575806796e-06, 'epoch': 0.82} + + 82%|████████▏ | 6040/7378 [20:42:44<4:35:28, 12.35s/it] + 82%|████████▏ | 6041/7378 [20:42:57<4:35:34, 12.37s/it] + +{'loss': 0.397, 'learning_rate': 1.6737365349048463e-06, 'epoch': 0.82} + + 82%|████████▏ | 6041/7378 [20:42:57<4:35:34, 12.37s/it] + 82%|████████▏ | 6042/7378 [20:43:09<4:34:10, 12.31s/it] + +{'loss': 0.3946, 'learning_rate': 1.6713059169858058e-06, 'epoch': 0.82} + + 82%|████████▏ | 6042/7378 [20:43:09<4:34:10, 12.31s/it] + 82%|████████▏ | 6043/7378 [20:43:21<4:33:00, 12.27s/it] + +{'loss': 0.4582, 'learning_rate': 1.6688769042920283e-06, 'epoch': 0.82} + + 82%|████████▏ | 6043/7378 [20:43:21<4:33:00, 12.27s/it] + 82%|████████▏ | 6044/7378 [20:43:34<4:33:34, 12.31s/it] + +{'loss': 0.4993, 'learning_rate': 1.6664494972916645e-06, 'epoch': 0.82} + + 82%|████████▏ | 6044/7378 [20:43:34<4:33:34, 12.31s/it] + 82%|████████▏ | 6045/7378 [20:43:46<4:33:47, 12.32s/it] + +{'loss': 0.4559, 'learning_rate': 1.6640236964525581e-06, 'epoch': 0.82} + + 82%|████████▏ | 6045/7378 [20:43:46<4:33:47, 12.32s/it] + 82%|████████▏ | 6046/7378 [20:43:58<4:31:34, 12.23s/it] + +{'loss': 0.4226, 'learning_rate': 1.6615995022422472e-06, 'epoch': 0.82} + + 82%|████████▏ | 6046/7378 [20:43:58<4:31:34, 12.23s/it] + 82%|████████▏ | 6047/7378 [20:44:10<4:27:52, 12.08s/it] + +{'loss': 0.4339, 'learning_rate': 1.6591769151279513e-06, 'epoch': 0.82} + + 82%|████████▏ | 6047/7378 [20:44:10<4:27:52, 12.08s/it] + 82%|████████▏ | 6048/7378 [20:44:22<4:27:23, 12.06s/it] + +{'loss': 0.4405, 'learning_rate': 1.6567559355765905e-06, 'epoch': 0.82} + + 82%|████████▏ | 6048/7378 [20:44:22<4:27:23, 12.06s/it] + 82%|████████▏ | 6049/7378 [20:44:34<4:26:43, 12.04s/it] + +{'loss': 0.4089, 'learning_rate': 1.6543365640547737e-06, 'epoch': 0.82} + + 82%|████████▏ | 6049/7378 [20:44:34<4:26:43, 12.04s/it] + 82%|████████▏ | 6050/7378 [20:44:46<4:27:25, 12.08s/it] + +{'loss': 0.4061, 'learning_rate': 1.6519188010287923e-06, 'epoch': 0.82} + + 82%|████████▏ | 6050/7378 [20:44:46<4:27:25, 12.08s/it] + 82%|████████▏ | 6051/7378 [20:44:59<4:33:13, 12.35s/it] + +{'loss': 0.4573, 'learning_rate': 1.6495026469646347e-06, 'epoch': 0.82} + + 82%|████████▏ | 6051/7378 [20:44:59<4:33:13, 12.35s/it] + 82%|████████▏ | 6052/7378 [20:45:11<4:32:49, 12.34s/it] + +{'loss': 0.4488, 'learning_rate': 1.6470881023279717e-06, 'epoch': 0.82} + + 82%|████████▏ | 6052/7378 [20:45:11<4:32:49, 12.34s/it] + 82%|████████▏ | 6053/7378 [20:45:23<4:30:29, 12.25s/it] + +{'loss': 0.4023, 'learning_rate': 1.6446751675841755e-06, 'epoch': 0.82} + + 82%|████████▏ | 6053/7378 [20:45:23<4:30:29, 12.25s/it] + 82%|████████▏ | 6054/7378 [20:45:35<4:28:36, 12.17s/it] + +{'loss': 0.4453, 'learning_rate': 1.6422638431982995e-06, 'epoch': 0.82} + + 82%|████████▏ | 6054/7378 [20:45:35<4:28:36, 12.17s/it] + 82%|████████▏ | 6055/7378 [20:45:47<4:28:35, 12.18s/it] + +{'loss': 0.4227, 'learning_rate': 1.6398541296350868e-06, 'epoch': 0.82} + + 82%|████████▏ | 6055/7378 [20:45:47<4:28:35, 12.18s/it] + 82%|████████▏ | 6056/7378 [20:46:00<4:30:34, 12.28s/it] + +{'loss': 0.4029, 'learning_rate': 1.6374460273589732e-06, 'epoch': 0.82} + + 82%|████████▏ | 6056/7378 [20:46:00<4:30:34, 12.28s/it] + 82%|████████▏ | 6057/7378 [20:46:12<4:30:04, 12.27s/it] + +{'loss': 0.4778, 'learning_rate': 1.635039536834081e-06, 'epoch': 0.82} + + 82%|████████▏ | 6057/7378 [20:46:12<4:30:04, 12.27s/it] + 82%|████████▏ | 6058/7378 [20:46:24<4:27:37, 12.16s/it] + +{'loss': 0.437, 'learning_rate': 1.632634658524226e-06, 'epoch': 0.82} + + 82%|████████▏ | 6058/7378 [20:46:24<4:27:37, 12.16s/it] + 82%|████████▏ | 6059/7378 [20:46:36<4:27:08, 12.15s/it] + +{'loss': 0.3842, 'learning_rate': 1.6302313928929104e-06, 'epoch': 0.82} + + 82%|████████▏ | 6059/7378 [20:46:36<4:27:08, 12.15s/it] + 82%|████████▏ | 6060/7378 [20:46:49<4:28:24, 12.22s/it] + +{'loss': 0.4226, 'learning_rate': 1.627829740403325e-06, 'epoch': 0.82} + + 82%|████████▏ | 6060/7378 [20:46:49<4:28:24, 12.22s/it] + 82%|████████▏ | 6061/7378 [20:47:01<4:29:45, 12.29s/it] + +{'loss': 0.4582, 'learning_rate': 1.6254297015183496e-06, 'epoch': 0.82} + + 82%|████████▏ | 6061/7378 [20:47:01<4:29:45, 12.29s/it] + 82%|████████▏ | 6062/7378 [20:47:13<4:29:14, 12.28s/it] + +{'loss': 0.4283, 'learning_rate': 1.6230312767005574e-06, 'epoch': 0.82} + + 82%|████████▏ | 6062/7378 [20:47:13<4:29:14, 12.28s/it] + 82%|████████▏ | 6063/7378 [20:47:26<4:30:17, 12.33s/it] + +{'loss': 0.4852, 'learning_rate': 1.6206344664122042e-06, 'epoch': 0.82} + + 82%|████████▏ | 6063/7378 [20:47:26<4:30:17, 12.33s/it] + 82%|████████▏ | 6064/7378 [20:47:38<4:30:20, 12.34s/it] + +{'loss': 0.4284, 'learning_rate': 1.6182392711152406e-06, 'epoch': 0.82} + + 82%|████████▏ | 6064/7378 [20:47:38<4:30:20, 12.34s/it] + 82%|████████▏ | 6065/7378 [20:47:50<4:27:35, 12.23s/it] + +{'loss': 0.3871, 'learning_rate': 1.6158456912712995e-06, 'epoch': 0.82} + + 82%|████████▏ | 6065/7378 [20:47:50<4:27:35, 12.23s/it] + 82%|████████▏ | 6066/7378 [20:48:02<4:27:22, 12.23s/it] + +{'loss': 0.3873, 'learning_rate': 1.613453727341706e-06, 'epoch': 0.82} + + 82%|████████▏ | 6066/7378 [20:48:02<4:27:22, 12.23s/it] + 82%|████████▏ | 6067/7378 [20:48:15<4:29:09, 12.32s/it] + +{'loss': 0.4032, 'learning_rate': 1.6110633797874776e-06, 'epoch': 0.82} + + 82%|████████▏ | 6067/7378 [20:48:15<4:29:09, 12.32s/it] + 82%|████████▏ | 6068/7378 [20:48:27<4:28:53, 12.32s/it] + +{'loss': 0.4475, 'learning_rate': 1.608674649069313e-06, 'epoch': 0.82} + + 82%|████████▏ | 6068/7378 [20:48:27<4:28:53, 12.32s/it] + 82%|████████▏ | 6069/7378 [20:48:39<4:28:08, 12.29s/it] + +{'loss': 0.389, 'learning_rate': 1.606287535647605e-06, 'epoch': 0.82} + + 82%|████████▏ | 6069/7378 [20:48:39<4:28:08, 12.29s/it] + 82%|████████▏ | 6070/7378 [20:48:52<4:27:38, 12.28s/it] + +{'loss': 0.4404, 'learning_rate': 1.6039020399824268e-06, 'epoch': 0.82} + + 82%|████████▏ | 6070/7378 [20:48:52<4:27:38, 12.28s/it] + 82%|████████▏ | 6071/7378 [20:49:04<4:27:14, 12.27s/it] + +{'loss': 0.3558, 'learning_rate': 1.601518162533553e-06, 'epoch': 0.82} + + 82%|████████▏ | 6071/7378 [20:49:04<4:27:14, 12.27s/it] + 82%|████████▏ | 6072/7378 [20:49:16<4:25:43, 12.21s/it] + +{'loss': 0.4363, 'learning_rate': 1.5991359037604338e-06, 'epoch': 0.82} + + 82%|████████▏ | 6072/7378 [20:49:16<4:25:43, 12.21s/it] + 82%|████████▏ | 6073/7378 [20:49:28<4:26:06, 12.24s/it] + +{'loss': 0.5137, 'learning_rate': 1.596755264122214e-06, 'epoch': 0.82} + + 82%|████████▏ | 6073/7378 [20:49:28<4:26:06, 12.24s/it] + 82%|████████▏ | 6074/7378 [20:49:40<4:26:27, 12.26s/it] + +{'loss': 0.4586, 'learning_rate': 1.5943762440777243e-06, 'epoch': 0.82} + + 82%|████████▏ | 6074/7378 [20:49:40<4:26:27, 12.26s/it] + 82%|████████▏ | 6075/7378 [20:49:53<4:26:26, 12.27s/it] + +{'loss': 0.4787, 'learning_rate': 1.5919988440854805e-06, 'epoch': 0.82} + + 82%|████████▏ | 6075/7378 [20:49:53<4:26:26, 12.27s/it] + 82%|████████▏ | 6076/7378 [20:50:05<4:26:32, 12.28s/it] + +{'loss': 0.4681, 'learning_rate': 1.5896230646036937e-06, 'epoch': 0.82} + + 82%|████████▏ | 6076/7378 [20:50:05<4:26:32, 12.28s/it] + 82%|████████▏ | 6077/7378 [20:50:17<4:25:30, 12.24s/it] + +{'loss': 0.4393, 'learning_rate': 1.5872489060902562e-06, 'epoch': 0.82} + + 82%|████████▏ | 6077/7378 [20:50:17<4:25:30, 12.24s/it] + 82%|████████▏ | 6078/7378 [20:50:30<4:25:32, 12.26s/it] + +{'loss': 0.3906, 'learning_rate': 1.5848763690027514e-06, 'epoch': 0.82} + + 82%|████████▏ | 6078/7378 [20:50:30<4:25:32, 12.26s/it] + 82%|████████▏ | 6079/7378 [20:50:41<4:23:20, 12.16s/it] + +{'loss': 0.4418, 'learning_rate': 1.5825054537984464e-06, 'epoch': 0.82} + + 82%|████████▏ | 6079/7378 [20:50:41<4:23:20, 12.16s/it] + 82%|████████▏ | 6080/7378 [20:50:54<4:25:02, 12.25s/it] + +{'loss': 0.437, 'learning_rate': 1.5801361609342958e-06, 'epoch': 0.82} + + 82%|████████▏ | 6080/7378 [20:50:54<4:25:02, 12.25s/it] + 82%|████████▏ | 6081/7378 [20:51:06<4:25:23, 12.28s/it] + +{'loss': 0.4107, 'learning_rate': 1.5777684908669499e-06, 'epoch': 0.82} + + 82%|████████▏ | 6081/7378 [20:51:06<4:25:23, 12.28s/it] + 82%|████████▏ | 6082/7378 [20:51:19<4:25:23, 12.29s/it] + +{'loss': 0.4385, 'learning_rate': 1.575402444052736e-06, 'epoch': 0.82} + + 82%|████████▏ | 6082/7378 [20:51:19<4:25:23, 12.29s/it] + 82%|████████▏ | 6083/7378 [20:51:31<4:27:43, 12.40s/it] + +{'loss': 0.4527, 'learning_rate': 1.5730380209476737e-06, 'epoch': 0.82} + + 82%|████████▏ | 6083/7378 [20:51:31<4:27:43, 12.40s/it] + 82%|████████▏ | 6084/7378 [20:51:44<4:27:20, 12.40s/it] + +{'loss': 0.3743, 'learning_rate': 1.5706752220074661e-06, 'epoch': 0.82} + + 82%|████████▏ | 6084/7378 [20:51:44<4:27:20, 12.40s/it] + 82%|████████▏ | 6085/7378 [20:51:56<4:27:25, 12.41s/it] + +{'loss': 0.4917, 'learning_rate': 1.5683140476875092e-06, 'epoch': 0.82} + + 82%|████████▏ | 6085/7378 [20:51:56<4:27:25, 12.41s/it] + 82%|████████▏ | 6086/7378 [20:52:08<4:25:01, 12.31s/it] + +{'loss': 0.4756, 'learning_rate': 1.565954498442882e-06, 'epoch': 0.82} + + 82%|████████▏ | 6086/7378 [20:52:08<4:25:01, 12.31s/it] + 83%|████████▎ | 6087/7378 [20:52:21<4:25:33, 12.34s/it] + +{'loss': 0.4617, 'learning_rate': 1.5635965747283488e-06, 'epoch': 0.83} + + 83%|████████▎ | 6087/7378 [20:52:21<4:25:33, 12.34s/it] + 83%|████████▎ | 6088/7378 [20:52:33<4:24:51, 12.32s/it] + +{'loss': 0.379, 'learning_rate': 1.5612402769983625e-06, 'epoch': 0.83} + + 83%|████████▎ | 6088/7378 [20:52:33<4:24:51, 12.32s/it] + 83%|████████▎ | 6089/7378 [20:52:45<4:25:39, 12.37s/it] + +{'loss': 0.3764, 'learning_rate': 1.5588856057070612e-06, 'epoch': 0.83} + + 83%|████████▎ | 6089/7378 [20:52:45<4:25:39, 12.37s/it] + 83%|████████▎ | 6090/7378 [20:52:57<4:23:52, 12.29s/it] + +{'loss': 0.4297, 'learning_rate': 1.556532561308275e-06, 'epoch': 0.83} + + 83%|████████▎ | 6090/7378 [20:52:57<4:23:52, 12.29s/it] + 83%|████████▎ | 6091/7378 [20:53:09<4:22:01, 12.22s/it] + +{'loss': 0.5114, 'learning_rate': 1.5541811442555122e-06, 'epoch': 0.83} + + 83%|████████▎ | 6091/7378 [20:53:09<4:22:01, 12.22s/it] + 83%|████████▎ | 6092/7378 [20:53:22<4:23:09, 12.28s/it] + +{'loss': 0.3625, 'learning_rate': 1.551831355001976e-06, 'epoch': 0.83} + + 83%|████████▎ | 6092/7378 [20:53:22<4:23:09, 12.28s/it] + 83%|████████▎ | 6093/7378 [20:53:34<4:21:39, 12.22s/it] + +{'loss': 0.39, 'learning_rate': 1.5494831940005484e-06, 'epoch': 0.83} + + 83%|████████▎ | 6093/7378 [20:53:34<4:21:39, 12.22s/it] + 83%|████████▎ | 6094/7378 [20:53:46<4:18:48, 12.09s/it] + +{'loss': 0.4062, 'learning_rate': 1.5471366617037998e-06, 'epoch': 0.83} + + 83%|████████▎ | 6094/7378 [20:53:46<4:18:48, 12.09s/it] + 83%|████████▎ | 6095/7378 [20:53:58<4:17:55, 12.06s/it] + +{'loss': 0.4062, 'learning_rate': 1.5447917585639905e-06, 'epoch': 0.83} + + 83%|████████▎ | 6095/7378 [20:53:58<4:17:55, 12.06s/it] + 83%|████████▎ | 6096/7378 [20:54:10<4:20:28, 12.19s/it] + +{'loss': 0.4223, 'learning_rate': 1.5424484850330623e-06, 'epoch': 0.83} + + 83%|████████▎ | 6096/7378 [20:54:10<4:20:28, 12.19s/it] + 83%|████████▎ | 6097/7378 [20:54:22<4:19:53, 12.17s/it] + +{'loss': 0.4284, 'learning_rate': 1.5401068415626442e-06, 'epoch': 0.83} + + 83%|████████▎ | 6097/7378 [20:54:22<4:19:53, 12.17s/it] + 83%|████████▎ | 6098/7378 [20:54:35<4:19:31, 12.17s/it] + +{'loss': 0.4041, 'learning_rate': 1.5377668286040525e-06, 'epoch': 0.83} + + 83%|████████▎ | 6098/7378 [20:54:35<4:19:31, 12.17s/it] + 83%|████████▎ | 6099/7378 [20:54:47<4:21:56, 12.29s/it] + +{'loss': 0.4538, 'learning_rate': 1.5354284466082836e-06, 'epoch': 0.83} + + 83%|████████▎ | 6099/7378 [20:54:47<4:21:56, 12.29s/it] + 83%|████████▎ | 6100/7378 [20:54:59<4:20:57, 12.25s/it] + +{'loss': 0.4373, 'learning_rate': 1.5330916960260312e-06, 'epoch': 0.83} + + 83%|████████▎ | 6100/7378 [20:54:59<4:20:57, 12.25s/it] + 83%|████████▎ | 6101/7378 [20:55:12<4:21:53, 12.31s/it] + +{'loss': 0.3979, 'learning_rate': 1.5307565773076626e-06, 'epoch': 0.83} + + 83%|████████▎ | 6101/7378 [20:55:12<4:21:53, 12.31s/it] + 83%|████████▎ | 6102/7378 [20:55:24<4:20:09, 12.23s/it] + +{'loss': 0.4454, 'learning_rate': 1.528423090903236e-06, 'epoch': 0.83} + + 83%|████████▎ | 6102/7378 [20:55:24<4:20:09, 12.23s/it] + 83%|████████▎ | 6103/7378 [20:55:37<4:23:21, 12.39s/it] + +{'loss': 0.4876, 'learning_rate': 1.5260912372624925e-06, 'epoch': 0.83} + + 83%|████████▎ | 6103/7378 [20:55:37<4:23:21, 12.39s/it] + 83%|████████▎ | 6104/7378 [20:55:49<4:21:26, 12.31s/it] + +{'loss': 0.4066, 'learning_rate': 1.523761016834866e-06, 'epoch': 0.83} + + 83%|████████▎ | 6104/7378 [20:55:49<4:21:26, 12.31s/it] + 83%|████████▎ | 6105/7378 [20:56:01<4:24:28, 12.47s/it] + +{'loss': 0.4246, 'learning_rate': 1.521432430069465e-06, 'epoch': 0.83} + + 83%|████████▎ | 6105/7378 [20:56:01<4:24:28, 12.47s/it] + 83%|████████▎ | 6106/7378 [20:56:14<4:24:24, 12.47s/it] + +{'loss': 0.4991, 'learning_rate': 1.5191054774150905e-06, 'epoch': 0.83} + + 83%|████████▎ | 6106/7378 [20:56:14<4:24:24, 12.47s/it] + 83%|████████▎ | 6107/7378 [20:56:27<4:24:43, 12.50s/it] + +{'loss': 0.3534, 'learning_rate': 1.5167801593202248e-06, 'epoch': 0.83} + + 83%|████████▎ | 6107/7378 [20:56:27<4:24:43, 12.50s/it] + 83%|████████▎ | 6108/7378 [20:56:39<4:24:12, 12.48s/it] + +{'loss': 0.4812, 'learning_rate': 1.514456476233035e-06, 'epoch': 0.83} + + 83%|████████▎ | 6108/7378 [20:56:39<4:24:12, 12.48s/it] + 83%|████████▎ | 6109/7378 [20:56:51<4:20:46, 12.33s/it] + +{'loss': 0.3631, 'learning_rate': 1.5121344286013784e-06, 'epoch': 0.83} + + 83%|████████▎ | 6109/7378 [20:56:51<4:20:46, 12.33s/it] + 83%|████████▎ | 6110/7378 [20:57:03<4:19:20, 12.27s/it] + +{'loss': 0.4456, 'learning_rate': 1.5098140168727916e-06, 'epoch': 0.83} + + 83%|████████▎ | 6110/7378 [20:57:03<4:19:20, 12.27s/it] + 83%|████████▎ | 6111/7378 [20:57:15<4:19:18, 12.28s/it] + +{'loss': 0.4394, 'learning_rate': 1.5074952414944976e-06, 'epoch': 0.83} + + 83%|████████▎ | 6111/7378 [20:57:15<4:19:18, 12.28s/it] + 83%|████████▎ | 6112/7378 [20:57:28<4:19:06, 12.28s/it] + +{'loss': 0.4008, 'learning_rate': 1.5051781029134016e-06, 'epoch': 0.83} + + 83%|████████▎ | 6112/7378 [20:57:28<4:19:06, 12.28s/it] + 83%|████████▎ | 6113/7378 [20:57:40<4:20:08, 12.34s/it] + +{'loss': 0.3598, 'learning_rate': 1.5028626015760995e-06, 'epoch': 0.83} + + 83%|████████▎ | 6113/7378 [20:57:40<4:20:08, 12.34s/it] + 83%|████████▎ | 6114/7378 [20:57:53<4:21:14, 12.40s/it] + +{'loss': 0.4303, 'learning_rate': 1.5005487379288675e-06, 'epoch': 0.83} + + 83%|████████▎ | 6114/7378 [20:57:53<4:21:14, 12.40s/it] + 83%|████████▎ | 6115/7378 [20:58:05<4:22:10, 12.46s/it] + +{'loss': 0.4466, 'learning_rate': 1.4982365124176645e-06, 'epoch': 0.83} + + 83%|████████▎ | 6115/7378 [20:58:05<4:22:10, 12.46s/it] + 83%|████████▎ | 6116/7378 [20:58:18<4:22:00, 12.46s/it] + +{'loss': 0.4035, 'learning_rate': 1.4959259254881375e-06, 'epoch': 0.83} + + 83%|████████▎ | 6116/7378 [20:58:18<4:22:00, 12.46s/it] + 83%|████████▎ | 6117/7378 [20:58:30<4:20:34, 12.40s/it] + +{'loss': 0.3946, 'learning_rate': 1.4936169775856124e-06, 'epoch': 0.83} + + 83%|████████▎ | 6117/7378 [20:58:30<4:20:34, 12.40s/it] + 83%|████████▎ | 6118/7378 [20:58:42<4:18:39, 12.32s/it] + +{'loss': 0.4389, 'learning_rate': 1.4913096691551077e-06, 'epoch': 0.83} + + 83%|████████▎ | 6118/7378 [20:58:42<4:18:39, 12.32s/it] + 83%|████████▎ | 6119/7378 [20:58:55<4:20:34, 12.42s/it] + +{'loss': 0.3923, 'learning_rate': 1.4890040006413187e-06, 'epoch': 0.83} + + 83%|████████▎ | 6119/7378 [20:58:55<4:20:34, 12.42s/it] + 83%|████████▎ | 6120/7378 [20:59:07<4:18:47, 12.34s/it] + +{'loss': 0.3765, 'learning_rate': 1.4866999724886277e-06, 'epoch': 0.83} + + 83%|████████▎ | 6120/7378 [20:59:07<4:18:47, 12.34s/it] + 83%|████████▎ | 6121/7378 [20:59:19<4:17:15, 12.28s/it] + +{'loss': 0.4087, 'learning_rate': 1.4843975851410964e-06, 'epoch': 0.83} + + 83%|████████▎ | 6121/7378 [20:59:19<4:17:15, 12.28s/it] + 83%|████████▎ | 6122/7378 [20:59:31<4:17:27, 12.30s/it] + +{'loss': 0.4324, 'learning_rate': 1.4820968390424783e-06, 'epoch': 0.83} + + 83%|████████▎ | 6122/7378 [20:59:31<4:17:27, 12.30s/it] + 83%|████████▎ | 6123/7378 [20:59:43<4:15:52, 12.23s/it] + +{'loss': 0.4804, 'learning_rate': 1.4797977346362046e-06, 'epoch': 0.83} + + 83%|████████▎ | 6123/7378 [20:59:43<4:15:52, 12.23s/it] + 83%|████████▎ | 6124/7378 [20:59:56<4:19:38, 12.42s/it] + +{'loss': 0.4595, 'learning_rate': 1.477500272365392e-06, 'epoch': 0.83} + + 83%|████████▎ | 6124/7378 [20:59:56<4:19:38, 12.42s/it] + 83%|████████▎ | 6125/7378 [21:00:08<4:17:15, 12.32s/it] + +{'loss': 0.469, 'learning_rate': 1.4752044526728393e-06, 'epoch': 0.83} + + 83%|████████▎ | 6125/7378 [21:00:08<4:17:15, 12.32s/it] + 83%|█████���██▎ | 6126/7378 [21:00:21<4:18:53, 12.41s/it] + +{'loss': 0.4199, 'learning_rate': 1.4729102760010282e-06, 'epoch': 0.83} + + 83%|████████▎ | 6126/7378 [21:00:21<4:18:53, 12.41s/it] + 83%|████████▎ | 6127/7378 [21:00:33<4:18:47, 12.41s/it] + +{'loss': 0.3763, 'learning_rate': 1.4706177427921297e-06, 'epoch': 0.83} + + 83%|████████▎ | 6127/7378 [21:00:33<4:18:47, 12.41s/it] + 83%|████████▎ | 6128/7378 [21:00:46<4:19:43, 12.47s/it] + +{'loss': 0.4284, 'learning_rate': 1.4683268534879925e-06, 'epoch': 0.83} + + 83%|████████▎ | 6128/7378 [21:00:46<4:19:43, 12.47s/it] + 83%|████████▎ | 6129/7378 [21:00:58<4:18:29, 12.42s/it] + +{'loss': 0.3334, 'learning_rate': 1.4660376085301476e-06, 'epoch': 0.83} + + 83%|████████▎ | 6129/7378 [21:00:58<4:18:29, 12.42s/it] + 83%|████████▎ | 6130/7378 [21:01:11<4:21:06, 12.55s/it] + +{'loss': 0.4459, 'learning_rate': 1.46375000835981e-06, 'epoch': 0.83} + + 83%|████████▎ | 6130/7378 [21:01:11<4:21:06, 12.55s/it] + 83%|████████▎ | 6131/7378 [21:01:24<4:19:46, 12.50s/it] + +{'loss': 0.3834, 'learning_rate': 1.4614640534178825e-06, 'epoch': 0.83} + + 83%|████████▎ | 6131/7378 [21:01:24<4:19:46, 12.50s/it] + 83%|████████▎ | 6132/7378 [21:01:35<4:15:30, 12.30s/it] + +{'loss': 0.4589, 'learning_rate': 1.459179744144945e-06, 'epoch': 0.83} + + 83%|████████▎ | 6132/7378 [21:01:35<4:15:30, 12.30s/it] + 83%|████████▎ | 6133/7378 [21:01:48<4:17:38, 12.42s/it] + +{'loss': 0.4373, 'learning_rate': 1.4568970809812643e-06, 'epoch': 0.83} + + 83%|████████▎ | 6133/7378 [21:01:48<4:17:38, 12.42s/it] + 83%|████████▎ | 6134/7378 [21:02:00<4:16:55, 12.39s/it] + +{'loss': 0.4468, 'learning_rate': 1.454616064366785e-06, 'epoch': 0.83} + + 83%|████████▎ | 6134/7378 [21:02:00<4:16:55, 12.39s/it] + 83%|████████▎ | 6135/7378 [21:02:13<4:15:12, 12.32s/it] + +{'loss': 0.4755, 'learning_rate': 1.4523366947411366e-06, 'epoch': 0.83} + + 83%|████████▎ | 6135/7378 [21:02:13<4:15:12, 12.32s/it] + 83%|████████▎ | 6136/7378 [21:02:25<4:15:05, 12.32s/it] + +{'loss': 0.3883, 'learning_rate': 1.4500589725436344e-06, 'epoch': 0.83} + + 83%|████████▎ | 6136/7378 [21:02:25<4:15:05, 12.32s/it] + 83%|████████▎ | 6137/7378 [21:02:37<4:12:47, 12.22s/it] + +{'loss': 0.4276, 'learning_rate': 1.4477828982132758e-06, 'epoch': 0.83} + + 83%|████████▎ | 6137/7378 [21:02:37<4:12:47, 12.22s/it] + 83%|████████▎ | 6138/7378 [21:02:49<4:11:30, 12.17s/it] + +{'loss': 0.4575, 'learning_rate': 1.4455084721887346e-06, 'epoch': 0.83} + + 83%|████████▎ | 6138/7378 [21:02:49<4:11:30, 12.17s/it] + 83%|████████▎ | 6139/7378 [21:03:01<4:10:32, 12.13s/it] + +{'loss': 0.4831, 'learning_rate': 1.4432356949083726e-06, 'epoch': 0.83} + + 83%|████████▎ | 6139/7378 [21:03:01<4:10:32, 12.13s/it] + 83%|████████▎ | 6140/7378 [21:03:13<4:08:50, 12.06s/it] + +{'loss': 0.3823, 'learning_rate': 1.4409645668102313e-06, 'epoch': 0.83} + + 83%|████████▎ | 6140/7378 [21:03:13<4:08:50, 12.06s/it] + 83%|████████▎ | 6141/7378 [21:03:25<4:09:50, 12.12s/it] + +{'loss': 0.403, 'learning_rate': 1.4386950883320327e-06, 'epoch': 0.83} + + 83%|████████▎ | 6141/7378 [21:03:25<4:09:50, 12.12s/it] + 83%|████████▎ | 6142/7378 [21:03:37<4:10:43, 12.17s/it] + +{'loss': 0.4245, 'learning_rate': 1.4364272599111883e-06, 'epoch': 0.83} + + 83%|████████▎ | 6142/7378 [21:03:37<4:10:43, 12.17s/it] + 83%|████████▎ | 6143/7378 [21:03:50<4:11:09, 12.20s/it] + +{'loss': 0.4673, 'learning_rate': 1.434161081984784e-06, 'epoch': 0.83} + + 83%|████████▎ | 6143/7378 [21:03:50<4:11:09, 12.20s/it] + 83%|████████▎ | 6144/7378 [21:04:02<4:13:17, 12.32s/it] + +{'loss': 0.4312, 'learning_rate': 1.4318965549895903e-06, 'epoch': 0.83} + + 83%|████████▎ | 6144/7378 [21:04:02<4:13:17, 12.32s/it] + 83%|████████▎ | 6145/7378 [21:04:14<4:12:04, 12.27s/it] + +{'loss': 0.4388, 'learning_rate': 1.4296336793620557e-06, 'epoch': 0.83} + + 83%|████████▎ | 6145/7378 [21:04:14<4:12:04, 12.27s/it] + 83%|████████▎ | 6146/7378 [21:04:27<4:12:06, 12.28s/it] + +{'loss': 0.4671, 'learning_rate': 1.4273724555383195e-06, 'epoch': 0.83} + + 83%|████████▎ | 6146/7378 [21:04:27<4:12:06, 12.28s/it] + 83%|████████▎ | 6147/7378 [21:04:39<4:10:31, 12.21s/it] + +{'loss': 0.4421, 'learning_rate': 1.425112883954195e-06, 'epoch': 0.83} + + 83%|████████▎ | 6147/7378 [21:04:39<4:10:31, 12.21s/it] + 83%|████████▎ | 6148/7378 [21:04:52<4:13:24, 12.36s/it] + +{'loss': 0.3811, 'learning_rate': 1.4228549650451794e-06, 'epoch': 0.83} + + 83%|████████▎ | 6148/7378 [21:04:52<4:13:24, 12.36s/it] + 83%|████████▎ | 6149/7378 [21:05:04<4:11:48, 12.29s/it] + +{'loss': 0.456, 'learning_rate': 1.4205986992464515e-06, 'epoch': 0.83} + + 83%|████████▎ | 6149/7378 [21:05:04<4:11:48, 12.29s/it] + 83%|████████▎ | 6150/7378 [21:05:16<4:11:02, 12.27s/it] + +{'loss': 0.4843, 'learning_rate': 1.4183440869928678e-06, 'epoch': 0.83} + + 83%|████████▎ | 6150/7378 [21:05:16<4:11:02, 12.27s/it] + 83%|████████▎ | 6151/7378 [21:05:29<4:13:22, 12.39s/it] + +{'loss': 0.4645, 'learning_rate': 1.4160911287189737e-06, 'epoch': 0.83} + + 83%|████████▎ | 6151/7378 [21:05:29<4:13:22, 12.39s/it] + 83%|████████▎ | 6152/7378 [21:05:41<4:11:13, 12.29s/it] + +{'loss': 0.4105, 'learning_rate': 1.4138398248589913e-06, 'epoch': 0.83} + + 83%|████████▎ | 6152/7378 [21:05:41<4:11:13, 12.29s/it] + 83%|████████▎ | 6153/7378 [21:05:53<4:08:50, 12.19s/it] + +{'loss': 0.4965, 'learning_rate': 1.411590175846822e-06, 'epoch': 0.83} + + 83%|████████▎ | 6153/7378 [21:05:53<4:08:50, 12.19s/it] + 83%|████████▎ | 6154/7378 [21:06:05<4:08:42, 12.19s/it] + +{'loss': 0.4296, 'learning_rate': 1.4093421821160502e-06, 'epoch': 0.83} + + 83%|████████▎ | 6154/7378 [21:06:05<4:08:42, 12.19s/it] + 83%|████████▎ | 6155/7378 [21:06:17<4:09:24, 12.24s/it] + +{'loss': 0.4334, 'learning_rate': 1.4070958440999438e-06, 'epoch': 0.83} + + 83%|████████▎ | 6155/7378 [21:06:17<4:09:24, 12.24s/it] + 83%|████████▎ | 6156/7378 [21:06:29<4:09:52, 12.27s/it] + +{'loss': 0.4266, 'learning_rate': 1.4048511622314488e-06, 'epoch': 0.83} + + 83%|████████▎ | 6156/7378 [21:06:29<4:09:52, 12.27s/it] + 83%|████████▎ | 6157/7378 [21:06:42<4:08:56, 12.23s/it] + +{'loss': 0.4116, 'learning_rate': 1.4026081369431909e-06, 'epoch': 0.83} + + 83%|████████▎ | 6157/7378 [21:06:42<4:08:56, 12.23s/it] + 83%|████████▎ | 6158/7378 [21:06:54<4:07:41, 12.18s/it] + +{'loss': 0.4008, 'learning_rate': 1.4003667686674792e-06, 'epoch': 0.83} + + 83%|████████▎ | 6158/7378 [21:06:54<4:07:41, 12.18s/it] + 83%|████████▎ | 6159/7378 [21:07:06<4:09:21, 12.27s/it] + +{'loss': 0.4478, 'learning_rate': 1.3981270578363004e-06, 'epoch': 0.83} + + 83%|████████▎ | 6159/7378 [21:07:06<4:09:21, 12.27s/it] + 83%|████████▎ | 6160/7378 [21:07:19<4:15:34, 12.59s/it] + +{'loss': 0.4914, 'learning_rate': 1.3958890048813267e-06, 'epoch': 0.83} + + 83%|████████▎ | 6160/7378 [21:07:19<4:15:34, 12.59s/it] + 84%|████████▎ | 6161/7378 [21:07:32<4:12:30, 12.45s/it] + +{'loss': 0.4887, 'learning_rate': 1.3936526102339077e-06, 'epoch': 0.84} + + 84%|████████▎ | 6161/7378 [21:07:32<4:12:30, 12.45s/it] + 84%|████████▎ | 6162/7378 [21:07:44<4:12:57, 12.48s/it] + +{'loss': 0.3993, 'learning_rate': 1.3914178743250707e-06, 'epoch': 0.84} + + 84%|████████▎ | 6162/7378 [21:07:44<4:12:57, 12.48s/it] + 84%|████████▎ | 6163/7378 [21:07:56<4:09:46, 12.33s/it] + +{'loss': 0.4122, 'learning_rate': 1.3891847975855255e-06, 'epoch': 0.84} + + 84%|████████▎ | 6163/7378 [21:07:56<4:09:46, 12.33s/it] + 84%|████████▎ | 6164/7378 [21:08:08<4:08:14, 12.27s/it] + +{'loss': 0.4136, 'learning_rate': 1.386953380445667e-06, 'epoch': 0.84} + + 84%|████████▎ | 6164/7378 [21:08:08<4:08:14, 12.27s/it] + 84%|████████▎ | 6165/7378 [21:08:20<4:07:21, 12.24s/it] + +{'loss': 0.4309, 'learning_rate': 1.3847236233355621e-06, 'epoch': 0.84} + + 84%|████████▎ | 6165/7378 [21:08:20<4:07:21, 12.24s/it] + 84%|████████▎ | 6166/7378 [21:08:33<4:06:14, 12.19s/it] + +{'loss': 0.4044, 'learning_rate': 1.3824955266849637e-06, 'epoch': 0.84} + + 84%|████████▎ | 6166/7378 [21:08:33<4:06:14, 12.19s/it] + 84%|████████▎ | 6167/7378 [21:08:45<4:06:14, 12.20s/it] + +{'loss': 0.4442, 'learning_rate': 1.380269090923302e-06, 'epoch': 0.84} + + 84%|████████▎ | 6167/7378 [21:08:45<4:06:14, 12.20s/it] + 84%|████████▎ | 6168/7378 [21:08:57<4:07:18, 12.26s/it] + +{'loss': 0.3802, 'learning_rate': 1.3780443164796854e-06, 'epoch': 0.84} + + 84%|████████▎ | 6168/7378 [21:08:57<4:07:18, 12.26s/it] + 84%|████████▎ | 6169/7378 [21:09:10<4:10:19, 12.42s/it] + +{'loss': 0.4202, 'learning_rate': 1.3758212037829076e-06, 'epoch': 0.84} + + 84%|█���██████▎ | 6169/7378 [21:09:10<4:10:19, 12.42s/it] + 84%|████████▎ | 6170/7378 [21:09:22<4:09:29, 12.39s/it] + +{'loss': 0.4498, 'learning_rate': 1.3735997532614375e-06, 'epoch': 0.84} + + 84%|████████▎ | 6170/7378 [21:09:22<4:09:29, 12.39s/it] + 84%|████████▎ | 6171/7378 [21:09:35<4:08:51, 12.37s/it] + +{'loss': 0.4393, 'learning_rate': 1.3713799653434246e-06, 'epoch': 0.84} + + 84%|████████▎ | 6171/7378 [21:09:35<4:08:51, 12.37s/it] + 84%|████████▎ | 6172/7378 [21:09:47<4:10:16, 12.45s/it] + +{'loss': 0.3743, 'learning_rate': 1.3691618404566954e-06, 'epoch': 0.84} + + 84%|████████▎ | 6172/7378 [21:09:47<4:10:16, 12.45s/it] + 84%|████████▎ | 6173/7378 [21:10:00<4:09:05, 12.40s/it] + +{'loss': 0.4367, 'learning_rate': 1.3669453790287646e-06, 'epoch': 0.84} + + 84%|████████▎ | 6173/7378 [21:10:00<4:09:05, 12.40s/it] + 84%|████████▎ | 6174/7378 [21:10:12<4:08:38, 12.39s/it] + +{'loss': 0.4398, 'learning_rate': 1.3647305814868173e-06, 'epoch': 0.84} + + 84%|████████▎ | 6174/7378 [21:10:12<4:08:38, 12.39s/it] + 84%|████████▎ | 6175/7378 [21:10:24<4:08:55, 12.42s/it] + +{'loss': 0.4009, 'learning_rate': 1.3625174482577208e-06, 'epoch': 0.84} + + 84%|████████▎ | 6175/7378 [21:10:24<4:08:55, 12.42s/it] + 84%|████████▎ | 6176/7378 [21:10:37<4:07:33, 12.36s/it] + +{'loss': 0.38, 'learning_rate': 1.3603059797680218e-06, 'epoch': 0.84} + + 84%|████████▎ | 6176/7378 [21:10:37<4:07:33, 12.36s/it] + 84%|████████▎ | 6177/7378 [21:10:49<4:06:26, 12.31s/it] + +{'loss': 0.468, 'learning_rate': 1.3580961764439449e-06, 'epoch': 0.84} + + 84%|████████▎ | 6177/7378 [21:10:49<4:06:26, 12.31s/it] + 84%|████████▎ | 6178/7378 [21:11:01<4:05:39, 12.28s/it] + +{'loss': 0.428, 'learning_rate': 1.3558880387113993e-06, 'epoch': 0.84} + + 84%|████████▎ | 6178/7378 [21:11:01<4:05:39, 12.28s/it] + 84%|████████▎ | 6179/7378 [21:11:13<4:05:12, 12.27s/it] + +{'loss': 0.438, 'learning_rate': 1.3536815669959635e-06, 'epoch': 0.84} + + 84%|████████▎ | 6179/7378 [21:11:13<4:05:12, 12.27s/it] + 84%|████████▍ | 6180/7378 [21:11:25<4:03:51, 12.21s/it] + +{'loss': 0.4137, 'learning_rate': 1.3514767617229051e-06, 'epoch': 0.84} + + 84%|████████▍ | 6180/7378 [21:11:25<4:03:51, 12.21s/it] + 84%|████████▍ | 6181/7378 [21:11:38<4:06:08, 12.34s/it] + +{'loss': 0.528, 'learning_rate': 1.3492736233171644e-06, 'epoch': 0.84} + + 84%|████████▍ | 6181/7378 [21:11:38<4:06:08, 12.34s/it] + 84%|████████▍ | 6182/7378 [21:11:50<4:04:25, 12.26s/it] + +{'loss': 0.4596, 'learning_rate': 1.3470721522033592e-06, 'epoch': 0.84} + + 84%|████████▍ | 6182/7378 [21:11:50<4:04:25, 12.26s/it] + 84%|████████▍ | 6183/7378 [21:12:02<4:03:14, 12.21s/it] + +{'loss': 0.4394, 'learning_rate': 1.3448723488057925e-06, 'epoch': 0.84} + + 84%|████████▍ | 6183/7378 [21:12:02<4:03:14, 12.21s/it] + 84%|████████▍ | 6184/7378 [21:12:14<4:03:01, 12.21s/it] + +{'loss': 0.4224, 'learning_rate': 1.3426742135484405e-06, 'epoch': 0.84} + + 84%|████████▍ | 6184/7378 [21:12:14<4:03:01, 12.21s/it] + 84%|████████▍ | 6185/7378 [21:12:27<4:03:28, 12.25s/it] + +{'loss': 0.398, 'learning_rate': 1.340477746854959e-06, 'epoch': 0.84} + + 84%|████████▍ | 6185/7378 [21:12:27<4:03:28, 12.25s/it] + 84%|████████▍ | 6186/7378 [21:12:39<4:04:37, 12.31s/it] + +{'loss': 0.4473, 'learning_rate': 1.3382829491486814e-06, 'epoch': 0.84} + + 84%|████████▍ | 6186/7378 [21:12:39<4:04:37, 12.31s/it] + 84%|████████▍ | 6187/7378 [21:12:51<4:03:53, 12.29s/it] + +{'loss': 0.4108, 'learning_rate': 1.3360898208526207e-06, 'epoch': 0.84} + + 84%|████████▍ | 6187/7378 [21:12:51<4:03:53, 12.29s/it] + 84%|████████▍ | 6188/7378 [21:13:04<4:03:44, 12.29s/it] + +{'loss': 0.505, 'learning_rate': 1.3338983623894696e-06, 'epoch': 0.84} + + 84%|████████▍ | 6188/7378 [21:13:04<4:03:44, 12.29s/it] + 84%|████████▍ | 6189/7378 [21:13:16<4:02:21, 12.23s/it] + +{'loss': 0.4348, 'learning_rate': 1.3317085741815972e-06, 'epoch': 0.84} + + 84%|████████▍ | 6189/7378 [21:13:16<4:02:21, 12.23s/it] + 84%|████████▍ | 6190/7378 [21:13:28<4:03:41, 12.31s/it] + +{'loss': 0.4024, 'learning_rate': 1.3295204566510511e-06, 'epoch': 0.84} + + 84%|████████▍ | 6190/7378 [21:13:28<4:03:41, 12.31s/it] + 84%|████████▍ | 6191/7378 [21:13:41<4:04:27, 12.36s/it] + +{'loss': 0.4477, 'learning_rate': 1.3273340102195532e-06, 'epoch': 0.84} + + 84%|████████▍ | 6191/7378 [21:13:41<4:04:27, 12.36s/it] + 84%|████████▍ | 6192/7378 [21:13:53<4:04:50, 12.39s/it] + +{'loss': 0.465, 'learning_rate': 1.3251492353085116e-06, 'epoch': 0.84} + + 84%|████████▍ | 6192/7378 [21:13:53<4:04:50, 12.39s/it] + 84%|████████▍ | 6193/7378 [21:14:05<4:01:07, 12.21s/it] + +{'loss': 0.4081, 'learning_rate': 1.3229661323390042e-06, 'epoch': 0.84} + + 84%|████████▍ | 6193/7378 [21:14:05<4:01:07, 12.21s/it] + 84%|████████▍ | 6194/7378 [21:14:17<4:01:36, 12.24s/it] + +{'loss': 0.4268, 'learning_rate': 1.320784701731792e-06, 'epoch': 0.84} + + 84%|████████▍ | 6194/7378 [21:14:17<4:01:36, 12.24s/it] + 84%|████████▍ | 6195/7378 [21:14:30<4:02:25, 12.30s/it] + +{'loss': 0.4485, 'learning_rate': 1.3186049439073112e-06, 'epoch': 0.84} + + 84%|████████▍ | 6195/7378 [21:14:30<4:02:25, 12.30s/it] + 84%|████████▍ | 6196/7378 [21:14:42<4:02:05, 12.29s/it] + +{'loss': 0.3612, 'learning_rate': 1.3164268592856722e-06, 'epoch': 0.84} + + 84%|████████▍ | 6196/7378 [21:14:42<4:02:05, 12.29s/it] + 84%|████████▍ | 6197/7378 [21:14:54<4:00:11, 12.20s/it] + +{'loss': 0.3785, 'learning_rate': 1.3142504482866714e-06, 'epoch': 0.84} + + 84%|████████▍ | 6197/7378 [21:14:54<4:00:11, 12.20s/it] + 84%|████████▍ | 6198/7378 [21:15:07<4:03:10, 12.36s/it] + +{'loss': 0.4735, 'learning_rate': 1.3120757113297777e-06, 'epoch': 0.84} + + 84%|████████▍ | 6198/7378 [21:15:07<4:03:10, 12.36s/it] + 84%|████████▍ | 6199/7378 [21:15:19<4:00:54, 12.26s/it] + +{'loss': 0.4212, 'learning_rate': 1.3099026488341348e-06, 'epoch': 0.84} + + 84%|████████▍ | 6199/7378 [21:15:19<4:00:54, 12.26s/it] + 84%|████████▍ | 6200/7378 [21:15:31<4:00:02, 12.23s/it] + +{'loss': 0.4358, 'learning_rate': 1.3077312612185688e-06, 'epoch': 0.84} + + 84%|████████▍ | 6200/7378 [21:15:31<4:00:02, 12.23s/it] + 84%|████████▍ | 6201/7378 [21:15:43<3:58:16, 12.15s/it] + +{'loss': 0.4209, 'learning_rate': 1.3055615489015771e-06, 'epoch': 0.84} + + 84%|████████▍ | 6201/7378 [21:15:43<3:58:16, 12.15s/it] + 84%|████████▍ | 6202/7378 [21:15:55<3:56:23, 12.06s/it] + +{'loss': 0.4016, 'learning_rate': 1.303393512301342e-06, 'epoch': 0.84} + + 84%|████████▍ | 6202/7378 [21:15:55<3:56:23, 12.06s/it] + 84%|████████▍ | 6203/7378 [21:16:07<3:57:57, 12.15s/it] + +{'loss': 0.4055, 'learning_rate': 1.3012271518357177e-06, 'epoch': 0.84} + + 84%|████████▍ | 6203/7378 [21:16:07<3:57:57, 12.15s/it] + 84%|████████▍ | 6204/7378 [21:16:19<3:58:34, 12.19s/it] + +{'loss': 0.442, 'learning_rate': 1.2990624679222341e-06, 'epoch': 0.84} + + 84%|████████▍ | 6204/7378 [21:16:19<3:58:34, 12.19s/it] + 84%|████████▍ | 6205/7378 [21:16:32<3:58:39, 12.21s/it] + +{'loss': 0.3896, 'learning_rate': 1.2968994609781005e-06, 'epoch': 0.84} + + 84%|████████▍ | 6205/7378 [21:16:32<3:58:39, 12.21s/it] + 84%|████████▍ | 6206/7378 [21:16:44<3:59:19, 12.25s/it] + +{'loss': 0.3875, 'learning_rate': 1.2947381314202046e-06, 'epoch': 0.84} + + 84%|████████▍ | 6206/7378 [21:16:44<3:59:19, 12.25s/it] + 84%|████████▍ | 6207/7378 [21:16:56<3:58:06, 12.20s/it] + +{'loss': 0.3926, 'learning_rate': 1.2925784796651086e-06, 'epoch': 0.84} + + 84%|████████▍ | 6207/7378 [21:16:56<3:58:06, 12.20s/it] + 84%|████████▍ | 6208/7378 [21:17:09<4:00:00, 12.31s/it] + +{'loss': 0.4482, 'learning_rate': 1.2904205061290497e-06, 'epoch': 0.84} + + 84%|████████▍ | 6208/7378 [21:17:09<4:00:00, 12.31s/it] + 84%|████████▍ | 6209/7378 [21:17:21<3:59:42, 12.30s/it] + +{'loss': 0.4498, 'learning_rate': 1.2882642112279454e-06, 'epoch': 0.84} + + 84%|████████▍ | 6209/7378 [21:17:21<3:59:42, 12.30s/it] + 84%|████████▍ | 6210/7378 [21:17:33<3:59:39, 12.31s/it] + +{'loss': 0.3588, 'learning_rate': 1.286109595377384e-06, 'epoch': 0.84} + + 84%|████████▍ | 6210/7378 [21:17:33<3:59:39, 12.31s/it] + 84%|████████▍ | 6211/7378 [21:17:45<3:56:29, 12.16s/it] + +{'loss': 0.4051, 'learning_rate': 1.283956658992639e-06, 'epoch': 0.84} + + 84%|████████▍ | 6211/7378 [21:17:45<3:56:29, 12.16s/it] + 84%|████████▍ | 6212/7378 [21:17:57<3:56:30, 12.17s/it] + +{'loss': 0.4414, 'learning_rate': 1.2818054024886517e-06, 'epoch': 0.84} + + 84%|████████▍ | 6212/7378 [21:17:57<3:56:30, 12.17s/it] + 84%|████████▍ | 6213/7378 [21:18:09<3:55:20, 12.12s/it] + +{'loss': 0.4743, 'learning_rate': 1.279655826280045e-06, 'epoch': 0.84} + + 84%|████████▍ | 6213/7378 [21:18:09<3:55:20, 12.12s/it] + 84%|████████▍ | 6214/7378 [21:18:21<3:54:26, 12.08s/it] + +{'loss': 0.3764, 'learning_rate': 1.2775079307811133e-06, 'epoch': 0.84} + + 84%|████████▍ | 6214/7378 [21:18:21<3:54:26, 12.08s/it] + 84%|████████▍ | 6215/7378 [21:18:34<3:57:56, 12.28s/it] + +{'loss': 0.4339, 'learning_rate': 1.275361716405834e-06, 'epoch': 0.84} + + 84%|████████▍ | 6215/7378 [21:18:34<3:57:56, 12.28s/it] + 84%|████████▍ | 6216/7378 [21:18:46<3:55:57, 12.18s/it] + +{'loss': 0.4157, 'learning_rate': 1.2732171835678531e-06, 'epoch': 0.84} + + 84%|████████▍ | 6216/7378 [21:18:46<3:55:57, 12.18s/it] + 84%|████████▍ | 6217/7378 [21:18:58<3:56:03, 12.20s/it] + +{'loss': 0.4355, 'learning_rate': 1.2710743326804974e-06, 'epoch': 0.84} + + 84%|████████▍ | 6217/7378 [21:18:58<3:56:03, 12.20s/it] + 84%|████████▍ | 6218/7378 [21:19:11<3:58:17, 12.33s/it] + +{'loss': 0.4379, 'learning_rate': 1.268933164156767e-06, 'epoch': 0.84} + + 84%|████████▍ | 6218/7378 [21:19:11<3:58:17, 12.33s/it] + 84%|████████▍ | 6219/7378 [21:19:23<3:59:17, 12.39s/it] + +{'loss': 0.431, 'learning_rate': 1.266793678409336e-06, 'epoch': 0.84} + + 84%|████████▍ | 6219/7378 [21:19:23<3:59:17, 12.39s/it] + 84%|████████▍ | 6220/7378 [21:19:36<3:58:41, 12.37s/it] + +{'loss': 0.4257, 'learning_rate': 1.2646558758505622e-06, 'epoch': 0.84} + + 84%|████████▍ | 6220/7378 [21:19:36<3:58:41, 12.37s/it] + 84%|████████▍ | 6221/7378 [21:19:48<3:56:58, 12.29s/it] + +{'loss': 0.4305, 'learning_rate': 1.2625197568924696e-06, 'epoch': 0.84} + + 84%|████████▍ | 6221/7378 [21:19:48<3:56:58, 12.29s/it] + 84%|████████▍ | 6222/7378 [21:20:00<3:59:17, 12.42s/it] + +{'loss': 0.4025, 'learning_rate': 1.260385321946761e-06, 'epoch': 0.84} + + 84%|████████▍ | 6222/7378 [21:20:00<3:59:17, 12.42s/it] + 84%|████████▍ | 6223/7378 [21:20:13<3:59:32, 12.44s/it] + +{'loss': 0.4137, 'learning_rate': 1.2582525714248195e-06, 'epoch': 0.84} + + 84%|████████▍ | 6223/7378 [21:20:13<3:59:32, 12.44s/it] + 84%|████████▍ | 6224/7378 [21:20:25<3:57:20, 12.34s/it] + +{'loss': 0.4368, 'learning_rate': 1.2561215057376953e-06, 'epoch': 0.84} + + 84%|████████▍ | 6224/7378 [21:20:25<3:57:20, 12.34s/it] + 84%|████████▍ | 6225/7378 [21:20:38<3:57:56, 12.38s/it] + +{'loss': 0.4102, 'learning_rate': 1.2539921252961207e-06, 'epoch': 0.84} + + 84%|████████▍ | 6225/7378 [21:20:38<3:57:56, 12.38s/it] + 84%|████████▍ | 6226/7378 [21:20:50<3:56:39, 12.33s/it] + +{'loss': 0.4807, 'learning_rate': 1.2518644305104987e-06, 'epoch': 0.84} + + 84%|████████▍ | 6226/7378 [21:20:50<3:56:39, 12.33s/it] + 84%|████████▍ | 6227/7378 [21:21:02<3:57:36, 12.39s/it] + +{'loss': 0.4592, 'learning_rate': 1.2497384217909102e-06, 'epoch': 0.84} + + 84%|████████▍ | 6227/7378 [21:21:02<3:57:36, 12.39s/it] + 84%|████████▍ | 6228/7378 [21:21:15<3:58:11, 12.43s/it] + +{'loss': 0.4844, 'learning_rate': 1.2476140995471097e-06, 'epoch': 0.84} + + 84%|████████▍ | 6228/7378 [21:21:15<3:58:11, 12.43s/it] + 84%|████████▍ | 6229/7378 [21:21:27<3:57:51, 12.42s/it] + +{'loss': 0.4689, 'learning_rate': 1.2454914641885251e-06, 'epoch': 0.84} + + 84%|████████▍ | 6229/7378 [21:21:27<3:57:51, 12.42s/it] + 84%|████████▍ | 6230/7378 [21:21:39<3:56:42, 12.37s/it] + +{'loss': 0.381, 'learning_rate': 1.2433705161242638e-06, 'epoch': 0.84} + + 84%|████████▍ | 6230/7378 [21:21:39<3:56:42, 12.37s/it] + 84%|████████▍ | 6231/7378 [21:21:52<3:56:59, 12.40s/it] + +{'loss': 0.3781, 'learning_rate': 1.241251255763105e-06, 'epoch': 0.84} + + 84%|████████▍ | 6231/7378 [21:21:52<3:56:59, 12.40s/it] + 84%|████████▍ | 6232/7378 [21:22:04<3:56:49, 12.40s/it] + +{'loss': 0.4294, 'learning_rate': 1.2391336835135015e-06, 'epoch': 0.84} + + 84%|████████▍ | 6232/7378 [21:22:04<3:56:49, 12.40s/it] + 84%|████████▍ | 6233/7378 [21:22:17<3:56:04, 12.37s/it] + +{'loss': 0.4311, 'learning_rate': 1.237017799783582e-06, 'epoch': 0.84} + + 84%|████████▍ | 6233/7378 [21:22:17<3:56:04, 12.37s/it] + 84%|████████▍ | 6234/7378 [21:22:29<3:53:39, 12.25s/it] + +{'loss': 0.5006, 'learning_rate': 1.2349036049811513e-06, 'epoch': 0.84} + + 84%|████████▍ | 6234/7378 [21:22:29<3:53:39, 12.25s/it] + 85%|████████▍ | 6235/7378 [21:22:41<3:52:35, 12.21s/it] + +{'loss': 0.3775, 'learning_rate': 1.2327910995136883e-06, 'epoch': 0.85} + + 85%|████████▍ | 6235/7378 [21:22:41<3:52:35, 12.21s/it] + 85%|████████▍ | 6236/7378 [21:22:53<3:54:00, 12.29s/it] + +{'loss': 0.4335, 'learning_rate': 1.2306802837883436e-06, 'epoch': 0.85} + + 85%|████████▍ | 6236/7378 [21:22:53<3:54:00, 12.29s/it] + 85%|████████▍ | 6237/7378 [21:23:06<3:55:51, 12.40s/it] + +{'loss': 0.4375, 'learning_rate': 1.228571158211943e-06, 'epoch': 0.85} + + 85%|████████▍ | 6237/7378 [21:23:06<3:55:51, 12.40s/it] + 85%|████████▍ | 6238/7378 [21:23:18<3:54:06, 12.32s/it] + +{'loss': 0.3626, 'learning_rate': 1.2264637231909871e-06, 'epoch': 0.85} + + 85%|████████▍ | 6238/7378 [21:23:18<3:54:06, 12.32s/it] + 85%|████████▍ | 6239/7378 [21:23:30<3:52:34, 12.25s/it] + +{'loss': 0.4666, 'learning_rate': 1.2243579791316552e-06, 'epoch': 0.85} + + 85%|████████▍ | 6239/7378 [21:23:30<3:52:34, 12.25s/it] + 85%|████████▍ | 6240/7378 [21:23:43<3:54:45, 12.38s/it] + +{'loss': 0.489, 'learning_rate': 1.2222539264397925e-06, 'epoch': 0.85} + + 85%|████████▍ | 6240/7378 [21:23:43<3:54:45, 12.38s/it] + 85%|████████▍ | 6241/7378 [21:23:55<3:54:56, 12.40s/it] + +{'loss': 0.3683, 'learning_rate': 1.220151565520924e-06, 'epoch': 0.85} + + 85%|████████▍ | 6241/7378 [21:23:55<3:54:56, 12.40s/it] + 85%|████████▍ | 6242/7378 [21:24:07<3:53:30, 12.33s/it] + +{'loss': 0.4684, 'learning_rate': 1.218050896780244e-06, 'epoch': 0.85} + + 85%|████████▍ | 6242/7378 [21:24:07<3:53:30, 12.33s/it] + 85%|████████▍ | 6243/7378 [21:24:19<3:51:28, 12.24s/it] + +{'loss': 0.4038, 'learning_rate': 1.2159519206226268e-06, 'epoch': 0.85} + + 85%|████████▍ | 6243/7378 [21:24:19<3:51:28, 12.24s/it] + 85%|████████▍ | 6244/7378 [21:24:31<3:48:54, 12.11s/it] + +{'loss': 0.3818, 'learning_rate': 1.2138546374526172e-06, 'epoch': 0.85} + + 85%|████████▍ | 6244/7378 [21:24:31<3:48:54, 12.11s/it] + 85%|████████▍ | 6245/7378 [21:24:43<3:48:04, 12.08s/it] + +{'loss': 0.4753, 'learning_rate': 1.2117590476744311e-06, 'epoch': 0.85} + + 85%|████████▍ | 6245/7378 [21:24:43<3:48:04, 12.08s/it] + 85%|████████▍ | 6246/7378 [21:24:56<3:50:34, 12.22s/it] + +{'loss': 0.3905, 'learning_rate': 1.2096651516919634e-06, 'epoch': 0.85} + + 85%|████████▍ | 6246/7378 [21:24:56<3:50:34, 12.22s/it] + 85%|████████▍ | 6247/7378 [21:25:08<3:48:23, 12.12s/it] + +{'loss': 0.4781, 'learning_rate': 1.2075729499087752e-06, 'epoch': 0.85} + + 85%|████████▍ | 6247/7378 [21:25:08<3:48:23, 12.12s/it] + 85%|████████▍ | 6248/7378 [21:25:20<3:50:11, 12.22s/it] + +{'loss': 0.4591, 'learning_rate': 1.2054824427281108e-06, 'epoch': 0.85} + + 85%|████████▍ | 6248/7378 [21:25:20<3:50:11, 12.22s/it] + 85%|████████▍ | 6249/7378 [21:25:32<3:49:42, 12.21s/it] + +{'loss': 0.4129, 'learning_rate': 1.2033936305528815e-06, 'epoch': 0.85} + + 85%|████████▍ | 6249/7378 [21:25:32<3:49:42, 12.21s/it] + 85%|████████▍ | 6250/7378 [21:25:44<3:46:45, 12.06s/it] + +{'loss': 0.3947, 'learning_rate': 1.2013065137856716e-06, 'epoch': 0.85} + + 85%|████████▍ | 6250/7378 [21:25:44<3:46:45, 12.06s/it] + 85%|████████▍ | 6251/7378 [21:25:56<3:47:35, 12.12s/it] + +{'loss': 0.4285, 'learning_rate': 1.1992210928287385e-06, 'epoch': 0.85} + + 85%|████████▍ | 6251/7378 [21:25:56<3:47:35, 12.12s/it] + 85%|████████▍ | 6252/7378 [21:26:09<3:49:27, 12.23s/it] + +{'loss': 0.4279, 'learning_rate': 1.1971373680840182e-06, 'epoch': 0.85} + + 85%|████████▍ | 6252/7378 [21:26:09<3:49:27, 12.23s/it] + 85%|████████▍ | 6253/7378 [21:26:21<3:50:00, 12.27s/it] + +{'loss': 0.4158, 'learning_rate': 1.195055339953115e-06, 'epoch': 0.85} + + 85%|████████▍ | 6253/7378 [21:26:21<3:50:00, 12.27s/it] + 85%|████████▍ | 6254/7378 [21:26:33<3:49:32, 12.25s/it] + +{'loss': 0.4424, 'learning_rate': 1.1929750088373071e-06, 'epoch': 0.85} + + 85%|████████▍ | 6254/7378 [21:26:33<3:49:32, 12.25s/it] + 85%|████████▍ | 6255/7378 [21:26:45<3:48:01, 12.18s/it] + +{'loss': 0.4084, 'learning_rate': 1.1908963751375446e-06, 'epoch': 0.85} + + 85%|████████▍ | 6255/7378 [21:26:45<3:48:01, 12.18s/it] + 85%|████████▍ | 6256/7378 [21:26:58<3:48:18, 12.21s/it] + +{'loss': 0.454, 'learning_rate': 1.1888194392544504e-06, 'epoch': 0.85} + + 85%|████████▍ | 6256/7378 [21:26:58<3:48:18, 12.21s/it] + 85%|████████▍ | 6257/7378 [21:27:10<3:47:43, 12.19s/it] + +{'loss': 0.4132, 'learning_rate': 1.1867442015883247e-06, 'epoch': 0.85} + + 85%|████████▍ | 6257/7378 [21:27:10<3:47:43, 12.19s/it] + 85%|████████▍ | 6258/7378 [21:27:22<3:49:09, 12.28s/it] + +{'loss': 0.4147, 'learning_rate': 1.1846706625391358e-06, 'epoch': 0.85} + + 85%|████████▍ | 6258/7378 [21:27:22<3:49:09, 12.28s/it] + 85%|████████▍ | 6259/7378 [21:27:34<3:48:20, 12.24s/it] + +{'loss': 0.4512, 'learning_rate': 1.1825988225065266e-06, 'epoch': 0.85} + + 85%|████████▍ | 6259/7378 [21:27:34<3:48:20, 12.24s/it] + 85%|████████▍ | 6260/7378 [21:27:47<3:49:59, 12.34s/it] + +{'loss': 0.4855, 'learning_rate': 1.18052868188981e-06, 'epoch': 0.85} + + 85%|████████▍ | 6260/7378 [21:27:47<3:49:59, 12.34s/it] + 85%|████████▍ | 6261/7378 [21:27:59<3:48:42, 12.29s/it] + +{'loss': 0.4724, 'learning_rate': 1.1784602410879708e-06, 'epoch': 0.85} + + 85%|████████▍ | 6261/7378 [21:27:59<3:48:42, 12.29s/it] + 85%|████████▍ | 6262/7378 [21:28:11<3:48:06, 12.26s/it] + +{'loss': 0.445, 'learning_rate': 1.1763935004996751e-06, 'epoch': 0.85} + + 85%|████████▍ | 6262/7378 [21:28:11<3:48:06, 12.26s/it] + 85%|████████▍ | 6263/7378 [21:28:23<3:46:53, 12.21s/it] + +{'loss': 0.4242, 'learning_rate': 1.1743284605232508e-06, 'epoch': 0.85} + + 85%|████████▍ | 6263/7378 [21:28:23<3:46:53, 12.21s/it] + 85%|████████▍ | 6264/7378 [21:28:36<3:46:33, 12.20s/it] + +{'loss': 0.4343, 'learning_rate': 1.1722651215567016e-06, 'epoch': 0.85} + + 85%|████████▍ | 6264/7378 [21:28:36<3:46:33, 12.20s/it] + 85%|████████▍ | 6265/7378 [21:28:48<3:46:37, 12.22s/it] + +{'loss': 0.4185, 'learning_rate': 1.1702034839977039e-06, 'epoch': 0.85} + + 85%|████████▍ | 6265/7378 [21:28:48<3:46:37, 12.22s/it] + 85%|████████▍ | 6266/7378 [21:29:00<3:46:04, 12.20s/it] + +{'loss': 0.4175, 'learning_rate': 1.1681435482436066e-06, 'epoch': 0.85} + + 85%|████████▍ | 6266/7378 [21:29:00<3:46:04, 12.20s/it] + 85%|████████▍ | 6267/7378 [21:29:12<3:47:15, 12.27s/it] + +{'loss': 0.3765, 'learning_rate': 1.166085314691432e-06, 'epoch': 0.85} + + 85%|████████▍ | 6267/7378 [21:29:12<3:47:15, 12.27s/it] + 85%|████████▍ | 6268/7378 [21:29:25<3:47:56, 12.32s/it] + +{'loss': 0.4428, 'learning_rate': 1.1640287837378706e-06, 'epoch': 0.85} + + 85%|████████▍ | 6268/7378 [21:29:25<3:47:56, 12.32s/it] + 85%|████████▍ | 6269/7378 [21:29:37<3:47:47, 12.32s/it] + +{'loss': 0.3773, 'learning_rate': 1.1619739557792863e-06, 'epoch': 0.85} + + 85%|████████▍ | 6269/7378 [21:29:37<3:47:47, 12.32s/it] + 85%|████████▍ | 6270/7378 [21:29:53<4:04:56, 13.26s/it] + +{'loss': 0.4766, 'learning_rate': 1.159920831211715e-06, 'epoch': 0.85} + + 85%|████████▍ | 6270/7378 [21:29:53<4:04:56, 13.26s/it] + 85%|████████▍ | 6271/7378 [21:30:05<3:59:48, 13.00s/it] + +{'loss': 0.4719, 'learning_rate': 1.157869410430863e-06, 'epoch': 0.85} + + 85%|████████▍ | 6271/7378 [21:30:05<3:59:48, 13.00s/it] + 85%|████████▌ | 6272/7378 [21:30:17<3:55:42, 12.79s/it] + +{'loss': 0.4554, 'learning_rate': 1.155819693832112e-06, 'epoch': 0.85} + + 85%|████████▌ | 6272/7378 [21:30:17<3:55:42, 12.79s/it] + 85%|████████▌ | 6273/7378 [21:30:29<3:52:07, 12.60s/it] + +{'loss': 0.4912, 'learning_rate': 1.1537716818105126e-06, 'epoch': 0.85} + + 85%|████████▌ | 6273/7378 [21:30:29<3:52:07, 12.60s/it] + 85%|████████▌ | 6274/7378 [21:30:42<3:50:43, 12.54s/it] + +{'loss': 0.4322, 'learning_rate': 1.151725374760786e-06, 'epoch': 0.85} + + 85%|████████▌ | 6274/7378 [21:30:42<3:50:43, 12.54s/it] + 85%|████████▌ | 6275/7378 [21:30:54<3:49:31, 12.49s/it] + +{'loss': 0.4189, 'learning_rate': 1.1496807730773242e-06, 'epoch': 0.85} + + 85%|████████▌ | 6275/7378 [21:30:54<3:49:31, 12.49s/it] + 85%|████████▌ | 6276/7378 [21:31:07<3:50:36, 12.56s/it] + +{'loss': 0.4332, 'learning_rate': 1.1476378771541953e-06, 'epoch': 0.85} + + 85%|████████▌ | 6276/7378 [21:31:07<3:50:36, 12.56s/it] + 85%|████████▌ | 6277/7378 [21:31:19<3:48:31, 12.45s/it] + +{'loss': 0.4803, 'learning_rate': 1.1455966873851343e-06, 'epoch': 0.85} + + 85%|████████▌ | 6277/7378 [21:31:19<3:48:31, 12.45s/it] + 85%|████████▌ | 6278/7378 [21:31:31<3:45:33, 12.30s/it] + +{'loss': 0.4092, 'learning_rate': 1.143557204163549e-06, 'epoch': 0.85} + + 85%|████████▌ | 6278/7378 [21:31:31<3:45:33, 12.30s/it] + 85%|████████▌ | 6279/7378 [21:31:43<3:43:09, 12.18s/it] + +{'loss': 0.4351, 'learning_rate': 1.1415194278825159e-06, 'epoch': 0.85} + + 85%|████████▌ | 6279/7378 [21:31:43<3:43:09, 12.18s/it] + 85%|████████▌ | 6280/7378 [21:31:55<3:42:27, 12.16s/it] + +{'loss': 0.3959, 'learning_rate': 1.1394833589347843e-06, 'epoch': 0.85} + + 85%|████████▌ | 6280/7378 [21:31:55<3:42:27, 12.16s/it] + 85%|████████▌ | 6281/7378 [21:32:08<3:44:59, 12.31s/it] + +{'loss': 0.3905, 'learning_rate': 1.1374489977127779e-06, 'epoch': 0.85} + + 85%|████████▌ | 6281/7378 [21:32:08<3:44:59, 12.31s/it] + 85%|████████▌ | 6282/7378 [21:32:20<3:44:50, 12.31s/it] + +{'loss': 0.4723, 'learning_rate': 1.1354163446085864e-06, 'epoch': 0.85} + + 85%|████████▌ | 6282/7378 [21:32:20<3:44:50, 12.31s/it] + 85%|████████▌ | 6283/7378 [21:32:32<3:41:37, 12.14s/it] + +{'loss': 0.3797, 'learning_rate': 1.133385400013971e-06, 'epoch': 0.85} + + 85%|████████▌ | 6283/7378 [21:32:32<3:41:37, 12.14s/it] + 85%|████████▌ | 6284/7378 [21:32:44<3:41:34, 12.15s/it] + +{'loss': 0.4037, 'learning_rate': 1.131356164320363e-06, 'epoch': 0.85} + + 85%|████████▌ | 6284/7378 [21:32:44<3:41:34, 12.15s/it] + 85%|████████▌ | 6285/7378 [21:32:57<3:44:28, 12.32s/it] + +{'loss': 0.4431, 'learning_rate': 1.1293286379188695e-06, 'epoch': 0.85} + + 85%|████████▌ | 6285/7378 [21:32:57<3:44:28, 12.32s/it] + 85%|████████▌ | 6286/7378 [21:33:09<3:45:10, 12.37s/it] + +{'loss': 0.4419, 'learning_rate': 1.1273028212002623e-06, 'epoch': 0.85} + + 85%|████████▌ | 6286/7378 [21:33:09<3:45:10, 12.37s/it] + 85%|████████▌ | 6287/7378 [21:33:22<3:45:30, 12.40s/it] + +{'loss': 0.4683, 'learning_rate': 1.1252787145549871e-06, 'epoch': 0.85} + + 85%|████████▌ | 6287/7378 [21:33:22<3:45:30, 12.40s/it] + 85%|████████▌ | 6288/7378 [21:33:34<3:42:43, 12.26s/it] + +{'loss': 0.4129, 'learning_rate': 1.123256318373157e-06, 'epoch': 0.85} + + 85%|████████▌ | 6288/7378 [21:33:34<3:42:43, 12.26s/it] + 85%|████████▌ | 6289/7378 [21:33:46<3:41:18, 12.19s/it] + +{'loss': 0.4049, 'learning_rate': 1.1212356330445562e-06, 'epoch': 0.85} + + 85%|████████▌ | 6289/7378 [21:33:46<3:41:18, 12.19s/it] + 85%|████████▌ | 6290/7378 [21:33:57<3:38:58, 12.08s/it] + +{'loss': 0.4176, 'learning_rate': 1.1192166589586428e-06, 'epoch': 0.85} + + 85%|████████▌ | 6290/7378 [21:33:58<3:38:58, 12.08s/it] + 85%|████████▌ | 6291/7378 [21:34:10<3:41:58, 12.25s/it] + +{'loss': 0.4658, 'learning_rate': 1.1171993965045424e-06, 'epoch': 0.85} + + 85%|████████▌ | 6291/7378 [21:34:10<3:41:58, 12.25s/it] + 85%|████████▌ | 6292/7378 [21:34:22<3:40:09, 12.16s/it] + +{'loss': 0.404, 'learning_rate': 1.1151838460710495e-06, 'epoch': 0.85} + + 85%|████████▌ | 6292/7378 [21:34:22<3:40:09, 12.16s/it] + 85%|████████▌ | 6293/7378 [21:34:34<3:40:17, 12.18s/it] + +{'loss': 0.421, 'learning_rate': 1.113170008046629e-06, 'epoch': 0.85} + + 85%|████████▌ | 6293/7378 [21:34:34<3:40:17, 12.18s/it] + 85%|████████▌ | 6294/7378 [21:34:46<3:38:57, 12.12s/it] + +{'loss': 0.4147, 'learning_rate': 1.1111578828194192e-06, 'epoch': 0.85} + + 85%|████████▌ | 6294/7378 [21:34:46<3:38:57, 12.12s/it] + 85%|████████▌ | 6295/7378 [21:34:58<3:39:08, 12.14s/it] + +{'loss': 0.4205, 'learning_rate': 1.1091474707772242e-06, 'epoch': 0.85} + + 85%|████████▌ | 6295/7378 [21:34:58<3:39:08, 12.14s/it] + 85%|████████▌ | 6296/7378 [21:35:11<3:41:57, 12.31s/it] + +{'loss': 0.4159, 'learning_rate': 1.107138772307519e-06, 'epoch': 0.85} + + 85%|████████▌ | 6296/7378 [21:35:11<3:41:57, 12.31s/it] + 85%|████████▌ | 6297/7378 [21:35:23<3:39:55, 12.21s/it] + +{'loss': 0.458, 'learning_rate': 1.10513178779745e-06, 'epoch': 0.85} + + 85%|████████▌ | 6297/7378 [21:35:23<3:39:55, 12.21s/it] + 85%|████████▌ | 6298/7378 [21:35:35<3:40:25, 12.25s/it] + +{'loss': 0.4457, 'learning_rate': 1.103126517633829e-06, 'epoch': 0.85} + + 85%|████████▌ | 6298/7378 [21:35:36<3:40:25, 12.25s/it] + 85%|████████▌ | 6299/7378 [21:35:47<3:38:47, 12.17s/it] + +{'loss': 0.4687, 'learning_rate': 1.101122962203144e-06, 'epoch': 0.85} + + 85%|████████▌ | 6299/7378 [21:35:47<3:38:47, 12.17s/it] + 85%|████████▌ | 6300/7378 [21:35:59<3:36:34, 12.05s/it] + +{'loss': 0.4437, 'learning_rate': 1.0991211218915475e-06, 'epoch': 0.85} + + 85%|████████▌ | 6300/7378 [21:35:59<3:36:34, 12.05s/it] + 85%|████████▌ | 6301/7378 [21:36:11<3:36:36, 12.07s/it] + +{'loss': 0.3819, 'learning_rate': 1.097120997084864e-06, 'epoch': 0.85} + + 85%|████████▌ | 6301/7378 [21:36:11<3:36:36, 12.07s/it] + 85%|████████▌ | 6302/7378 [21:36:23<3:36:18, 12.06s/it] + +{'loss': 0.4971, 'learning_rate': 1.0951225881685823e-06, 'epoch': 0.85} + + 85%|████████▌ | 6302/7378 [21:36:23<3:36:18, 12.06s/it] + 85%|████████▌ | 6303/7378 [21:36:36<3:36:55, 12.11s/it] + +{'loss': 0.3995, 'learning_rate': 1.09312589552787e-06, 'epoch': 0.85} + + 85%|████████▌ | 6303/7378 [21:36:36<3:36:55, 12.11s/it] + 85%|████████▌ | 6304/7378 [21:36:48<3:38:19, 12.20s/it] + +{'loss': 0.3964, 'learning_rate': 1.091130919547555e-06, 'epoch': 0.85} + + 85%|████████▌ | 6304/7378 [21:36:48<3:38:19, 12.20s/it] + 85%|████████▌ | 6305/7378 [21:37:00<3:39:18, 12.26s/it] + +{'loss': 0.3998, 'learning_rate': 1.0891376606121385e-06, 'epoch': 0.85} + + 85%|████████▌ | 6305/7378 [21:37:00<3:39:18, 12.26s/it] + 85%|████████▌ | 6306/7378 [21:37:13<3:40:57, 12.37s/it] + +{'loss': 0.4616, 'learning_rate': 1.0871461191057885e-06, 'epoch': 0.85} + + 85%|████████▌ | 6306/7378 [21:37:13<3:40:57, 12.37s/it] + 85%|████████▌ | 6307/7378 [21:37:26<3:42:15, 12.45s/it] + +{'loss': 0.3759, 'learning_rate': 1.085156295412344e-06, 'epoch': 0.85} + + 85%|████████▌ | 6307/7378 [21:37:26<3:42:15, 12.45s/it] + 85%|████████▌ | 6308/7378 [21:37:38<3:39:14, 12.29s/it] + +{'loss': 0.4777, 'learning_rate': 1.0831681899153135e-06, 'epoch': 0.85} + + 85%|████████▌ | 6308/7378 [21:37:38<3:39:14, 12.29s/it] + 86%|████████▌ | 6309/7378 [21:37:50<3:39:43, 12.33s/it] + +{'loss': 0.4626, 'learning_rate': 1.0811818029978715e-06, 'epoch': 0.86} + + 86%|████████▌ | 6309/7378 [21:37:50<3:39:43, 12.33s/it] + 86%|████████▌ | 6310/7378 [21:38:02<3:37:57, 12.25s/it] + +{'loss': 0.3826, 'learning_rate': 1.0791971350428654e-06, 'epoch': 0.86} + + 86%|████████▌ | 6310/7378 [21:38:02<3:37:57, 12.25s/it] + 86%|████████▌ | 6311/7378 [21:38:14<3:36:34, 12.18s/it] + +{'loss': 0.4856, 'learning_rate': 1.0772141864328078e-06, 'epoch': 0.86} + + 86%|████████▌ | 6311/7378 [21:38:14<3:36:34, 12.18s/it] + 86%|████████▌ | 6312/7378 [21:38:27<3:37:53, 12.26s/it] + +{'loss': 0.4763, 'learning_rate': 1.0752329575498789e-06, 'epoch': 0.86} + + 86%|████████▌ | 6312/7378 [21:38:27<3:37:53, 12.26s/it] + 86%|████████▌ | 6313/7378 [21:38:39<3:38:52, 12.33s/it] + +{'loss': 0.4483, 'learning_rate': 1.0732534487759327e-06, 'epoch': 0.86} + + 86%|████████▌ | 6313/7378 [21:38:39<3:38:52, 12.33s/it] + 86%|████████▌ | 6314/7378 [21:38:51<3:37:45, 12.28s/it] + +{'loss': 0.4129, 'learning_rate': 1.0712756604924868e-06, 'epoch': 0.86} + + 86%|████████▌ | 6314/7378 [21:38:51<3:37:45, 12.28s/it] + 86%|████████▌ | 6315/7378 [21:39:04<3:39:08, 12.37s/it] + +{'loss': 0.4703, 'learning_rate': 1.0692995930807292e-06, 'epoch': 0.86} + + 86%|████████▌ | 6315/7378 [21:39:04<3:39:08, 12.37s/it] + 86%|████████▌ | 6316/7378 [21:39:16<3:36:56, 12.26s/it] + +{'loss': 0.4441, 'learning_rate': 1.0673252469215155e-06, 'epoch': 0.86} + + 86%|████████▌ | 6316/7378 [21:39:16<3:36:56, 12.26s/it] + 86%|████████▌ | 6317/7378 [21:39:28<3:37:16, 12.29s/it] + +{'loss': 0.4432, 'learning_rate': 1.0653526223953692e-06, 'epoch': 0.86} + + 86%|████████▌ | 6317/7378 [21:39:28<3:37:16, 12.29s/it] + 86%|████████▌ | 6318/7378 [21:39:41<3:37:58, 12.34s/it] + +{'loss': 0.4188, 'learning_rate': 1.0633817198824859e-06, 'epoch': 0.86} + + 86%|████████▌ | 6318/7378 [21:39:41<3:37:58, 12.34s/it] + 86%|████████▌ | 6319/7378 [21:39:53<3:37:00, 12.29s/it] + +{'loss': 0.41, 'learning_rate': 1.0614125397627229e-06, 'epoch': 0.86} + + 86%|████████▌ | 6319/7378 [21:39:53<3:37:00, 12.29s/it] + 86%|████████▌ | 6320/7378 [21:40:05<3:36:55, 12.30s/it] + +{'loss': 0.4585, 'learning_rate': 1.0594450824156111e-06, 'epoch': 0.86} + + 86%|████████▌ | 6320/7378 [21:40:05<3:36:55, 12.30s/it] + 86%|████████▌ | 6321/7378 [21:40:17<3:36:34, 12.29s/it] + +{'loss': 0.3787, 'learning_rate': 1.057479348220346e-06, 'epoch': 0.86} + + 86%|████████▌ | 6321/7378 [21:40:17<3:36:34, 12.29s/it] + 86%|████████▌ | 6322/7378 [21:40:29<3:35:05, 12.22s/it] + +{'loss': 0.397, 'learning_rate': 1.055515337555789e-06, 'epoch': 0.86} + + 86%|████████▌ | 6322/7378 [21:40:29<3:35:05, 12.22s/it] + 86%|████████▌ | 6323/7378 [21:40:42<3:37:23, 12.36s/it] + +{'loss': 0.4299, 'learning_rate': 1.0535530508004789e-06, 'epoch': 0.86} + + 86%|████████▌ | 6323/7378 [21:40:42<3:37:23, 12.36s/it] + 86%|████████▌ | 6324/7378 [21:40:55<3:40:12, 12.54s/it] + +{'loss': 0.4335, 'learning_rate': 1.051592488332611e-06, 'epoch': 0.86} + + 86%|████████▌ | 6324/7378 [21:40:55<3:40:12, 12.54s/it] + 86%|████████▌ | 6325/7378 [21:41:07<3:37:36, 12.40s/it] + +{'loss': 0.4276, 'learning_rate': 1.0496336505300552e-06, 'epoch': 0.86} + + 86%|████████▌ | 6325/7378 [21:41:07<3:37:36, 12.40s/it] + 86%|████████▌ | 6326/7378 [21:41:20<3:37:04, 12.38s/it] + +{'loss': 0.4863, 'learning_rate': 1.0476765377703435e-06, 'epoch': 0.86} + + 86%|████████▌ | 6326/7378 [21:41:20<3:37:04, 12.38s/it] + 86%|████████▌ | 6327/7378 [21:41:32<3:36:44, 12.37s/it] + +{'loss': 0.4352, 'learning_rate': 1.045721150430683e-06, 'epoch': 0.86} + + 86%|████████▌ | 6327/7378 [21:41:32<3:36:44, 12.37s/it] + 86%|████████▌ | 6328/7378 [21:41:44<3:36:43, 12.38s/it] + +{'loss': 0.3653, 'learning_rate': 1.0437674888879424e-06, 'epoch': 0.86} + + 86%|████████▌ | 6328/7378 [21:41:44<3:36:43, 12.38s/it] + 86%|████████▌ | 6329/7378 [21:41:57<3:35:48, 12.34s/it] + +{'loss': 0.4282, 'learning_rate': 1.0418155535186591e-06, 'epoch': 0.86} + + 86%|████████▌ | 6329/7378 [21:41:57<3:35:48, 12.34s/it] + 86%|████████▌ | 6330/7378 [21:42:09<3:35:33, 12.34s/it] + +{'loss': 0.4847, 'learning_rate': 1.039865344699037e-06, 'epoch': 0.86} + + 86%|████████▌ | 6330/7378 [21:42:09<3:35:33, 12.34s/it] + 86%|████████▌ | 6331/7378 [21:42:21<3:33:39, 12.24s/it] + +{'loss': 0.4841, 'learning_rate': 1.0379168628049475e-06, 'epoch': 0.86} + + 86%|████████▌ | 6331/7378 [21:42:21<3:33:39, 12.24s/it] + 86%|████████▌ | 6332/7378 [21:42:33<3:34:10, 12.29s/it] + +{'loss': 0.4124, 'learning_rate': 1.0359701082119345e-06, 'epoch': 0.86} + + 86%|████████▌ | 6332/7378 [21:42:33<3:34:10, 12.29s/it] + 86%|████████▌ | 6333/7378 [21:42:45<3:31:54, 12.17s/it] + +{'loss': 0.4287, 'learning_rate': 1.0340250812952e-06, 'epoch': 0.86} + + 86%|████████▌ | 6333/7378 [21:42:45<3:31:54, 12.17s/it] + 86%|████████▌ | 6334/7378 [21:42:57<3:30:49, 12.12s/it] + +{'loss': 0.3924, 'learning_rate': 1.0320817824296202e-06, 'epoch': 0.86} + + 86%|████████▌ | 6334/7378 [21:42:57<3:30:49, 12.12s/it] + 86%|████████▌ | 6335/7378 [21:43:09<3:31:06, 12.14s/it] + +{'loss': 0.3975, 'learning_rate': 1.030140211989733e-06, 'epoch': 0.86} + + 86%|████████▌ | 6335/7378 [21:43:09<3:31:06, 12.14s/it] + 86%|████████▌ | 6336/7378 [21:43:21<3:30:39, 12.13s/it] + +{'loss': 0.4834, 'learning_rate': 1.0282003703497478e-06, 'epoch': 0.86} + + 86%|████████▌ | 6336/7378 [21:43:21<3:30:39, 12.13s/it] + 86%|████████▌ | 6337/7378 [21:43:34<3:33:00, 12.28s/it] + +{'loss': 0.408, 'learning_rate': 1.026262257883538e-06, 'epoch': 0.86} + + 86%|████████▌ | 6337/7378 [21:43:34<3:33:00, 12.28s/it] + 86%|████████▌ | 6338/7378 [21:43:46<3:31:05, 12.18s/it] + +{'loss': 0.4363, 'learning_rate': 1.0243258749646445e-06, 'epoch': 0.86} + + 86%|████████▌ | 6338/7378 [21:43:46<3:31:05, 12.18s/it] + 86%|████████▌ | 6339/7378 [21:43:59<3:32:36, 12.28s/it] + +{'loss': 0.4549, 'learning_rate': 1.0223912219662746e-06, 'epoch': 0.86} + + 86%|████████▌ | 6339/7378 [21:43:59<3:32:36, 12.28s/it] + 86%|████████▌ | 6340/7378 [21:44:11<3:32:45, 12.30s/it] + +{'loss': 0.4721, 'learning_rate': 1.020458299261301e-06, 'epoch': 0.86} + + 86%|████████▌ | 6340/7378 [21:44:11<3:32:45, 12.30s/it] + 86%|████████▌ | 6341/7378 [21:44:25<3:39:48, 12.72s/it] + +{'loss': 0.4977, 'learning_rate': 1.0185271072222668e-06, 'epoch': 0.86} + + 86%|████████▌ | 6341/7378 [21:44:25<3:39:48, 12.72s/it] + 86%|████████▌ | 6342/7378 [21:44:37<3:37:27, 12.59s/it] + +{'loss': 0.4706, 'learning_rate': 1.0165976462213779e-06, 'epoch': 0.86} + + 86%|████████▌ | 6342/7378 [21:44:37<3:37:27, 12.59s/it] + 86%|████████▌ | 6343/7378 [21:44:49<3:36:19, 12.54s/it] + +{'loss': 0.4971, 'learning_rate': 1.0146699166305073e-06, 'epoch': 0.86} + + 86%|████████▌ | 6343/7378 [21:44:49<3:36:19, 12.54s/it] + 86%|████████▌ | 6344/7378 [21:45:01<3:34:20, 12.44s/it] + +{'loss': 0.4734, 'learning_rate': 1.0127439188211941e-06, 'epoch': 0.86} + + 86%|████████▌ | 6344/7378 [21:45:02<3:34:20, 12.44s/it] + 86%|████████▌ | 6345/7378 [21:45:14<3:34:07, 12.44s/it] + +{'loss': 0.4757, 'learning_rate': 1.0108196531646464e-06, 'epoch': 0.86} + + 86%|████████▌ | 6345/7378 [21:45:14<3:34:07, 12.44s/it] + 86%|████████▌ | 6346/7378 [21:45:29<3:49:14, 13.33s/it] + +{'loss': 0.4694, 'learning_rate': 1.0088971200317344e-06, 'epoch': 0.86} + + 86%|████████▌ | 6346/7378 [21:45:29<3:49:14, 13.33s/it] + 86%|████████▌ | 6347/7378 [21:45:42<3:44:46, 13.08s/it] + +{'loss': 0.4982, 'learning_rate': 1.006976319792996e-06, 'epoch': 0.86} + + 86%|████████▌ | 6347/7378 [21:45:42<3:44:46, 13.08s/it] + 86%|████████▌ | 6348/7378 [21:45:57<3:56:59, 13.81s/it] + +{'loss': 0.3822, 'learning_rate': 1.0050572528186375e-06, 'epoch': 0.86} + + 86%|████████▌ | 6348/7378 [21:45:57<3:56:59, 13.81s/it] + 86%|████████▌ | 6349/7378 [21:46:14<4:09:04, 14.52s/it] + +{'loss': 0.4264, 'learning_rate': 1.0031399194785252e-06, 'epoch': 0.86} + + 86%|████████▌ | 6349/7378 [21:46:14<4:09:04, 14.52s/it] + 86%|████████▌ | 6350/7378 [21:46:26<3:56:42, 13.82s/it] + +{'loss': 0.4316, 'learning_rate': 1.001224320142199e-06, 'epoch': 0.86} + + 86%|████████▌ | 6350/7378 [21:46:26<3:56:42, 13.82s/it] + 86%|████████▌ | 6351/7378 [21:46:41<4:03:24, 14.22s/it] + +{'loss': 0.4588, 'learning_rate': 9.99310455178859e-07, 'epoch': 0.86} + + 86%|████████▌ | 6351/7378 [21:46:41<4:03:24, 14.22s/it] + 86%|████████▌ | 6352/7378 [21:47:03<4:44:17, 16.63s/it] + +{'loss': 0.3735, 'learning_rate': 9.973983249573726e-07, 'epoch': 0.86} + + 86%|████████▌ | 6352/7378 [21:47:03<4:44:17, 16.63s/it] + 86%|████████▌ | 6353/7378 [21:47:15<4:19:29, 15.19s/it] + +{'loss': 0.4255, 'learning_rate': 9.954879298462717e-07, 'epoch': 0.86} + + 86%|████████▌ | 6353/7378 [21:47:15<4:19:29, 15.19s/it] + 86%|████████▌ | 6354/7378 [21:47:27<4:02:26, 14.21s/it] + +{'loss': 0.4437, 'learning_rate': 9.935792702137558e-07, 'epoch': 0.86} + + 86%|████████▌ | 6354/7378 [21:47:27<4:02:26, 14.21s/it] + 86%|████████▌ | 6355/7378 [21:47:39<3:51:13, 13.56s/it] + +{'loss': 0.4482, 'learning_rate': 9.916723464276924e-07, 'epoch': 0.86} + + 86%|████████▌ | 6355/7378 [21:47:39<3:51:13, 13.56s/it] + 86%|████████▌ | 6356/7378 [21:47:54<3:56:41, 13.90s/it] + +{'loss': 0.4396, 'learning_rate': 9.89767158855608e-07, 'epoch': 0.86} + + 86%|████████▌ | 6356/7378 [21:47:54<3:56:41, 13.90s/it] + 86%|████████▌ | 6357/7378 [21:48:06<3:47:37, 13.38s/it] + +{'loss': 0.486, 'learning_rate': 9.878637078646968e-07, 'epoch': 0.86} + + 86%|████████▌ | 6357/7378 [21:48:06<3:47:37, 13.38s/it] + 86%|████████▌ | 6358/7378 [21:48:18<3:40:20, 12.96s/it] + +{'loss': 0.4307, 'learning_rate': 9.859619938218223e-07, 'epoch': 0.86} + + 86%|████████▌ | 6358/7378 [21:48:18<3:40:20, 12.96s/it] + 86%|████████▌ | 6359/7378 [21:48:30<3:35:51, 12.71s/it] + +{'loss': 0.3803, 'learning_rate': 9.840620170935057e-07, 'epoch': 0.86} + + 86%|████████▌ | 6359/7378 [21:48:30<3:35:51, 12.71s/it] + 86%|████████▌ | 6360/7378 [21:48:42<3:32:08, 12.50s/it] + +{'loss': 0.4437, 'learning_rate': 9.821637780459426e-07, 'epoch': 0.86} + + 86%|████████▌ | 6360/7378 [21:48:42<3:32:08, 12.50s/it] + 86%|████████▌ | 6361/7378 [21:48:54<3:32:03, 12.51s/it] + +{'loss': 0.4109, 'learning_rate': 9.80267277044985e-07, 'epoch': 0.86} + + 86%|████████▌ | 6361/7378 [21:48:54<3:32:03, 12.51s/it] + 86%|████████▌ | 6362/7378 [21:49:07<3:30:20, 12.42s/it] + +{'loss': 0.3882, 'learning_rate': 9.783725144561574e-07, 'epoch': 0.86} + + 86%|████████▌ | 6362/7378 [21:49:07<3:30:20, 12.42s/it] + 86%|████████▌ | 6363/7378 [21:49:19<3:28:29, 12.32s/it] + +{'loss': 0.4681, 'learning_rate': 9.764794906446395e-07, 'epoch': 0.86} + + 86%|████████▌ | 6363/7378 [21:49:19<3:28:29, 12.32s/it] + 86%|████████▋ | 6364/7378 [21:49:31<3:27:40, 12.29s/it] + +{'loss': 0.3888, 'learning_rate': 9.745882059752886e-07, 'epoch': 0.86} + + 86%|████████▋ | 6364/7378 [21:49:31<3:27:40, 12.29s/it] + 86%|████████▋ | 6365/7378 [21:49:43<3:25:24, 12.17s/it] + +{'loss': 0.465, 'learning_rate': 9.726986608126176e-07, 'epoch': 0.86} + + 86%|████████▋ | 6365/7378 [21:49:43<3:25:24, 12.17s/it] + 86%|████████▋ | 6366/7378 [21:49:55<3:25:19, 12.17s/it] + +{'loss': 0.4074, 'learning_rate': 9.708108555208073e-07, 'epoch': 0.86} + + 86%|████████▋ | 6366/7378 [21:49:55<3:25:19, 12.17s/it] + 86%|████████▋ | 6367/7378 [21:50:07<3:25:53, 12.22s/it] + +{'loss': 0.4664, 'learning_rate': 9.68924790463701e-07, 'epoch': 0.86} + + 86%|████████▋ | 6367/7378 [21:50:07<3:25:53, 12.22s/it] + 86%|████████▋ | 6368/7378 [21:50:19<3:23:24, 12.08s/it] + +{'loss': 0.4824, 'learning_rate': 9.670404660048072e-07, 'epoch': 0.86} + + 86%|████████▋ | 6368/7378 [21:50:19<3:23:24, 12.08s/it] + 86%|████████▋ | 6369/7378 [21:50:31<3:23:10, 12.08s/it] + +{'loss': 0.4791, 'learning_rate': 9.65157882507305e-07, 'epoch': 0.86} + + 86%|████████▋ | 6369/7378 [21:50:31<3:23:10, 12.08s/it] + 86%|████████▋ | 6370/7378 [21:50:43<3:22:40, 12.06s/it] + +{'loss': 0.4073, 'learning_rate': 9.632770403340275e-07, 'epoch': 0.86} + + 86%|████████▋ | 6370/7378 [21:50:43<3:22:40, 12.06s/it] + 86%|████████▋ | 6371/7378 [21:50:55<3:22:59, 12.09s/it] + +{'loss': 0.461, 'learning_rate': 9.613979398474815e-07, 'epoch': 0.86} + + 86%|████████▋ | 6371/7378 [21:50:55<3:22:59, 12.09s/it] + 86%|████████▋ | 6372/7378 [21:51:08<3:23:38, 12.15s/it] + +{'loss': 0.4703, 'learning_rate': 9.59520581409832e-07, 'epoch': 0.86} + + 86%|████████▋ | 6372/7378 [21:51:08<3:23:38, 12.15s/it] + 86%|████████▋ | 6373/7378 [21:51:20<3:23:05, 12.13s/it] + +{'loss': 0.4149, 'learning_rate': 9.57644965382908e-07, 'epoch': 0.86} + + 86%|████████▋ | 6373/7378 [21:51:20<3:23:05, 12.13s/it] + 86%|████████▋ | 6374/7378 [21:51:32<3:23:45, 12.18s/it] + +{'loss': 0.4424, 'learning_rate': 9.557710921282105e-07, 'epoch': 0.86} + + 86%|████████▋ | 6374/7378 [21:51:32<3:23:45, 12.18s/it] + 86%|████████▋ | 6375/7378 [21:51:45<3:25:49, 12.31s/it] + +{'loss': 0.3823, 'learning_rate': 9.53898962006896e-07, 'epoch': 0.86} + + 86%|████████▋ | 6375/7378 [21:51:45<3:25:49, 12.31s/it] + 86%|████████▋ | 6376/7378 [21:51:57<3:26:14, 12.35s/it] + +{'loss': 0.4944, 'learning_rate': 9.520285753797897e-07, 'epoch': 0.86} + + 86%|████████▋ | 6376/7378 [21:51:57<3:26:14, 12.35s/it] + 86%|████████▋ | 6377/7378 [21:52:09<3:26:24, 12.37s/it] + +{'loss': 0.4397, 'learning_rate': 9.501599326073762e-07, 'epoch': 0.86} + + 86%|████████▋ | 6377/7378 [21:52:10<3:26:24, 12.37s/it] + 86%|████████▋ | 6378/7378 [21:52:22<3:26:09, 12.37s/it] + +{'loss': 0.5354, 'learning_rate': 9.482930340498109e-07, 'epoch': 0.86} + + 86%|████████▋ | 6378/7378 [21:52:22<3:26:09, 12.37s/it] + 86%|████████▋ | 6379/7378 [21:52:34<3:25:42, 12.35s/it] + +{'loss': 0.4048, 'learning_rate': 9.46427880066908e-07, 'epoch': 0.86} + + 86%|████████▋ | 6379/7378 [21:52:34<3:25:42, 12.35s/it] + 86%|████████▋ | 6380/7378 [21:52:47<3:26:35, 12.42s/it] + +{'loss': 0.4414, 'learning_rate': 9.445644710181467e-07, 'epoch': 0.86} + + 86%|████████▋ | 6380/7378 [21:52:47<3:26:35, 12.42s/it] + 86%|████████▋ | 6381/7378 [21:52:59<3:24:17, 12.29s/it] + +{'loss': 0.4693, 'learning_rate': 9.427028072626687e-07, 'epoch': 0.86} + + 86%|████████▋ | 6381/7378 [21:52:59<3:24:17, 12.29s/it] + 87%|████████▋ | 6382/7378 [21:53:11<3:25:01, 12.35s/it] + +{'loss': 0.416, 'learning_rate': 9.408428891592802e-07, 'epoch': 0.87} + + 87%|████████▋ | 6382/7378 [21:53:11<3:25:01, 12.35s/it] + 87%|████████▋ | 6383/7378 [21:53:23<3:22:15, 12.20s/it] + +{'loss': 0.4449, 'learning_rate': 9.389847170664546e-07, 'epoch': 0.87} + + 87%|████████▋ | 6383/7378 [21:53:23<3:22:15, 12.20s/it] + 87%|████████▋ | 6384/7378 [21:53:35<3:21:31, 12.16s/it] + +{'loss': 0.4669, 'learning_rate': 9.37128291342323e-07, 'epoch': 0.87} + + 87%|████████▋ | 6384/7378 [21:53:35<3:21:31, 12.16s/it] + 87%|████████▋ | 6385/7378 [21:53:48<3:22:57, 12.26s/it] + +{'loss': 0.4213, 'learning_rate': 9.352736123446827e-07, 'epoch': 0.87} + + 87%|████████▋ | 6385/7378 [21:53:48<3:22:57, 12.26s/it] + 87%|████████▋ | 6386/7378 [21:54:00<3:23:02, 12.28s/it] + +{'loss': 0.4819, 'learning_rate': 9.334206804309919e-07, 'epoch': 0.87} + + 87%|████████▋ | 6386/7378 [21:54:00<3:23:02, 12.28s/it] + 87%|████████▋ | 6387/7378 [21:54:13<3:24:54, 12.41s/it] + +{'loss': 0.433, 'learning_rate': 9.315694959583788e-07, 'epoch': 0.87} + + 87%|████████▋ | 6387/7378 [21:54:13<3:24:54, 12.41s/it] + 87%|███████���▋ | 6388/7378 [21:54:25<3:25:06, 12.43s/it] + +{'loss': 0.4441, 'learning_rate': 9.297200592836264e-07, 'epoch': 0.87} + + 87%|████████▋ | 6388/7378 [21:54:25<3:25:06, 12.43s/it] + 87%|████████▋ | 6389/7378 [21:54:38<3:24:56, 12.43s/it] + +{'loss': 0.4122, 'learning_rate': 9.278723707631865e-07, 'epoch': 0.87} + + 87%|████████▋ | 6389/7378 [21:54:38<3:24:56, 12.43s/it] + 87%|████████▋ | 6390/7378 [21:54:50<3:22:49, 12.32s/it] + +{'loss': 0.3757, 'learning_rate': 9.260264307531719e-07, 'epoch': 0.87} + + 87%|████████▋ | 6390/7378 [21:54:50<3:22:49, 12.32s/it] + 87%|████████▋ | 6391/7378 [21:55:02<3:21:30, 12.25s/it] + +{'loss': 0.4526, 'learning_rate': 9.241822396093569e-07, 'epoch': 0.87} + + 87%|████████▋ | 6391/7378 [21:55:02<3:21:30, 12.25s/it] + 87%|████████▋ | 6392/7378 [21:55:14<3:21:54, 12.29s/it] + +{'loss': 0.44, 'learning_rate': 9.223397976871829e-07, 'epoch': 0.87} + + 87%|████████▋ | 6392/7378 [21:55:14<3:21:54, 12.29s/it] + 87%|████████▋ | 6393/7378 [21:55:27<3:22:28, 12.33s/it] + +{'loss': 0.4094, 'learning_rate': 9.2049910534175e-07, 'epoch': 0.87} + + 87%|████████▋ | 6393/7378 [21:55:27<3:22:28, 12.33s/it] + 87%|████████▋ | 6394/7378 [21:55:39<3:21:22, 12.28s/it] + +{'loss': 0.4734, 'learning_rate': 9.186601629278236e-07, 'epoch': 0.87} + + 87%|████████▋ | 6394/7378 [21:55:39<3:21:22, 12.28s/it] + 87%|████████▋ | 6395/7378 [21:55:51<3:21:47, 12.32s/it] + +{'loss': 0.4159, 'learning_rate': 9.1682297079983e-07, 'epoch': 0.87} + + 87%|████████▋ | 6395/7378 [21:55:51<3:21:47, 12.32s/it] + 87%|████████▋ | 6396/7378 [21:56:03<3:20:06, 12.23s/it] + +{'loss': 0.4464, 'learning_rate': 9.149875293118604e-07, 'epoch': 0.87} + + 87%|████████▋ | 6396/7378 [21:56:03<3:20:06, 12.23s/it] + 87%|████████▋ | 6397/7378 [21:56:16<3:21:32, 12.33s/it] + +{'loss': 0.4329, 'learning_rate': 9.131538388176664e-07, 'epoch': 0.87} + + 87%|████████▋ | 6397/7378 [21:56:16<3:21:32, 12.33s/it] + 87%|████████▋ | 6398/7378 [21:56:28<3:20:01, 12.25s/it] + +{'loss': 0.4681, 'learning_rate': 9.113218996706652e-07, 'epoch': 0.87} + + 87%|████████▋ | 6398/7378 [21:56:28<3:20:01, 12.25s/it] + 87%|████████▋ | 6399/7378 [21:56:40<3:20:36, 12.29s/it] + +{'loss': 0.43, 'learning_rate': 9.094917122239322e-07, 'epoch': 0.87} + + 87%|████████▋ | 6399/7378 [21:56:40<3:20:36, 12.29s/it] + 87%|████████▋ | 6400/7378 [21:56:52<3:19:29, 12.24s/it] + +{'loss': 0.4203, 'learning_rate': 9.076632768302085e-07, 'epoch': 0.87} + + 87%|████████▋ | 6400/7378 [21:56:52<3:19:29, 12.24s/it] + 87%|████████▋ | 6401/7378 [21:57:05<3:20:30, 12.31s/it] + +{'loss': 0.4764, 'learning_rate': 9.058365938418945e-07, 'epoch': 0.87} + + 87%|████████▋ | 6401/7378 [21:57:05<3:20:30, 12.31s/it] + 87%|████████▋ | 6402/7378 [21:57:17<3:20:04, 12.30s/it] + +{'loss': 0.4304, 'learning_rate': 9.040116636110574e-07, 'epoch': 0.87} + + 87%|████████▋ | 6402/7378 [21:57:17<3:20:04, 12.30s/it] + 87%|████████▋ | 6403/7378 [21:57:29<3:19:31, 12.28s/it] + +{'loss': 0.3984, 'learning_rate': 9.021884864894226e-07, 'epoch': 0.87} + + 87%|████████▋ | 6403/7378 [21:57:29<3:19:31, 12.28s/it] + 87%|████████▋ | 6404/7378 [21:57:42<3:21:31, 12.41s/it] + +{'loss': 0.4406, 'learning_rate': 9.003670628283789e-07, 'epoch': 0.87} + + 87%|████████▋ | 6404/7378 [21:57:42<3:21:31, 12.41s/it] + 87%|████████▋ | 6405/7378 [21:57:54<3:20:12, 12.35s/it] + +{'loss': 0.3749, 'learning_rate': 8.985473929789746e-07, 'epoch': 0.87} + + 87%|████████▋ | 6405/7378 [21:57:54<3:20:12, 12.35s/it] + 87%|████████▋ | 6406/7378 [21:58:06<3:18:14, 12.24s/it] + +{'loss': 0.4075, 'learning_rate': 8.967294772919277e-07, 'epoch': 0.87} + + 87%|████████▋ | 6406/7378 [21:58:06<3:18:14, 12.24s/it] + 87%|████████▋ | 6407/7378 [21:58:19<3:19:18, 12.32s/it] + +{'loss': 0.3808, 'learning_rate': 8.949133161176104e-07, 'epoch': 0.87} + + 87%|████████▋ | 6407/7378 [21:58:19<3:19:18, 12.32s/it] + 87%|████████▋ | 6408/7378 [21:58:31<3:19:27, 12.34s/it] + +{'loss': 0.5005, 'learning_rate': 8.930989098060594e-07, 'epoch': 0.87} + + 87%|████████▋ | 6408/7378 [21:58:31<3:19:27, 12.34s/it] + 87%|████████▋ | 6409/7378 [21:58:43<3:16:47, 12.19s/it] + +{'loss': 0.4006, 'learning_rate': 8.912862587069726e-07, 'epoch': 0.87} + + 87%|████████▋ | 6409/7378 [21:58:43<3:16:47, 12.19s/it] + 87%|��███████▋ | 6410/7378 [21:58:55<3:16:23, 12.17s/it] + +{'loss': 0.4443, 'learning_rate': 8.894753631697095e-07, 'epoch': 0.87} + + 87%|████████▋ | 6410/7378 [21:58:55<3:16:23, 12.17s/it] + 87%|████████▋ | 6411/7378 [21:59:07<3:17:21, 12.25s/it] + +{'loss': 0.4076, 'learning_rate': 8.876662235432931e-07, 'epoch': 0.87} + + 87%|████████▋ | 6411/7378 [21:59:07<3:17:21, 12.25s/it] + 87%|████████▋ | 6412/7378 [21:59:20<3:17:46, 12.28s/it] + +{'loss': 0.4354, 'learning_rate': 8.858588401764079e-07, 'epoch': 0.87} + + 87%|████████▋ | 6412/7378 [21:59:20<3:17:46, 12.28s/it] + 87%|████████▋ | 6413/7378 [21:59:32<3:17:24, 12.27s/it] + +{'loss': 0.4874, 'learning_rate': 8.840532134173963e-07, 'epoch': 0.87} + + 87%|████████▋ | 6413/7378 [21:59:32<3:17:24, 12.27s/it] + 87%|████████▋ | 6414/7378 [21:59:44<3:17:48, 12.31s/it] + +{'loss': 0.4287, 'learning_rate': 8.822493436142643e-07, 'epoch': 0.87} + + 87%|████████▋ | 6414/7378 [21:59:44<3:17:48, 12.31s/it] + 87%|████████▋ | 6415/7378 [21:59:57<3:20:07, 12.47s/it] + +{'loss': 0.4547, 'learning_rate': 8.804472311146817e-07, 'epoch': 0.87} + + 87%|████████▋ | 6415/7378 [21:59:57<3:20:07, 12.47s/it] + 87%|████████▋ | 6416/7378 [22:00:10<3:19:36, 12.45s/it] + +{'loss': 0.4072, 'learning_rate': 8.786468762659772e-07, 'epoch': 0.87} + + 87%|████████▋ | 6416/7378 [22:00:10<3:19:36, 12.45s/it] + 87%|████████▋ | 6417/7378 [22:00:22<3:17:58, 12.36s/it] + +{'loss': 0.4236, 'learning_rate': 8.768482794151389e-07, 'epoch': 0.87} + + 87%|████████▋ | 6417/7378 [22:00:22<3:17:58, 12.36s/it] + 87%|████████▋ | 6418/7378 [22:00:34<3:17:09, 12.32s/it] + +{'loss': 0.4615, 'learning_rate': 8.750514409088206e-07, 'epoch': 0.87} + + 87%|████████▋ | 6418/7378 [22:00:34<3:17:09, 12.32s/it] + 87%|████████▋ | 6419/7378 [22:00:46<3:17:12, 12.34s/it] + +{'loss': 0.3987, 'learning_rate': 8.73256361093332e-07, 'epoch': 0.87} + + 87%|████████▋ | 6419/7378 [22:00:46<3:17:12, 12.34s/it] + 87%|████████▋ | 6420/7378 [22:00:59<3:16:45, 12.32s/it] + +{'loss': 0.3693, 'learning_rate': 8.714630403146496e-07, 'epoch': 0.87} + + 87%|████████▋ | 6420/7378 [22:00:59<3:16:45, 12.32s/it] + 87%|████████▋ | 6421/7378 [22:01:12<3:18:50, 12.47s/it] + +{'loss': 0.4756, 'learning_rate': 8.696714789184069e-07, 'epoch': 0.87} + + 87%|████████▋ | 6421/7378 [22:01:12<3:18:50, 12.47s/it] + 87%|████████▋ | 6422/7378 [22:01:24<3:17:43, 12.41s/it] + +{'loss': 0.4696, 'learning_rate': 8.678816772498988e-07, 'epoch': 0.87} + + 87%|████████▋ | 6422/7378 [22:01:24<3:17:43, 12.41s/it] + 87%|████████▋ | 6423/7378 [22:01:36<3:17:56, 12.44s/it] + +{'loss': 0.4415, 'learning_rate': 8.660936356540794e-07, 'epoch': 0.87} + + 87%|████████▋ | 6423/7378 [22:01:36<3:17:56, 12.44s/it] + 87%|████████▋ | 6424/7378 [22:01:49<3:20:15, 12.59s/it] + +{'loss': 0.4394, 'learning_rate': 8.643073544755709e-07, 'epoch': 0.87} + + 87%|████████▋ | 6424/7378 [22:01:49<3:20:15, 12.59s/it] + 87%|████████▋ | 6425/7378 [22:02:01<3:16:21, 12.36s/it] + +{'loss': 0.4216, 'learning_rate': 8.625228340586467e-07, 'epoch': 0.87} + + 87%|████████▋ | 6425/7378 [22:02:01<3:16:21, 12.36s/it] + 87%|████████▋ | 6426/7378 [22:02:13<3:15:22, 12.31s/it] + +{'loss': 0.4666, 'learning_rate': 8.607400747472471e-07, 'epoch': 0.87} + + 87%|████████▋ | 6426/7378 [22:02:13<3:15:22, 12.31s/it] + 87%|████████▋ | 6427/7378 [22:02:25<3:12:10, 12.12s/it] + +{'loss': 0.4014, 'learning_rate': 8.589590768849698e-07, 'epoch': 0.87} + + 87%|████████▋ | 6427/7378 [22:02:25<3:12:10, 12.12s/it] + 87%|████████▋ | 6428/7378 [22:02:37<3:11:13, 12.08s/it] + +{'loss': 0.461, 'learning_rate': 8.571798408150745e-07, 'epoch': 0.87} + + 87%|████████▋ | 6428/7378 [22:02:37<3:11:13, 12.08s/it] + 87%|████████▋ | 6429/7378 [22:02:49<3:11:10, 12.09s/it] + +{'loss': 0.4176, 'learning_rate': 8.554023668804812e-07, 'epoch': 0.87} + + 87%|████████▋ | 6429/7378 [22:02:49<3:11:10, 12.09s/it] + 87%|████████▋ | 6430/7378 [22:03:01<3:11:27, 12.12s/it] + +{'loss': 0.3934, 'learning_rate': 8.536266554237715e-07, 'epoch': 0.87} + + 87%|████████▋ | 6430/7378 [22:03:01<3:11:27, 12.12s/it] + 87%|████████▋ | 6431/7378 [22:03:13<3:10:42, 12.08s/it] + +{'loss': 0.4321, 'learning_rate': 8.518527067871851e-07, 'epoch': 0.87} + + 87%|████████▋ | 6431/7378 [22:03:13<3:10:42, 12.08s/it] + 87%|████████▋ | 6432/7378 [22:03:25<3:11:17, 12.13s/it] + +{'loss': 0.3961, 'learning_rate': 8.500805213126217e-07, 'epoch': 0.87} + + 87%|████████▋ | 6432/7378 [22:03:26<3:11:17, 12.13s/it] + 87%|████████▋ | 6433/7378 [22:03:37<3:09:34, 12.04s/it] + +{'loss': 0.447, 'learning_rate': 8.483100993416415e-07, 'epoch': 0.87} + + 87%|████████▋ | 6433/7378 [22:03:37<3:09:34, 12.04s/it] + 87%|████████▋ | 6434/7378 [22:03:50<3:10:46, 12.13s/it] + +{'loss': 0.4319, 'learning_rate': 8.465414412154693e-07, 'epoch': 0.87} + + 87%|████████▋ | 6434/7378 [22:03:50<3:10:46, 12.13s/it] + 87%|████████▋ | 6435/7378 [22:04:02<3:11:40, 12.20s/it] + +{'loss': 0.3591, 'learning_rate': 8.447745472749836e-07, 'epoch': 0.87} + + 87%|████████▋ | 6435/7378 [22:04:02<3:11:40, 12.20s/it] + 87%|████████▋ | 6436/7378 [22:04:14<3:10:00, 12.10s/it] + +{'loss': 0.4271, 'learning_rate': 8.430094178607262e-07, 'epoch': 0.87} + + 87%|████████▋ | 6436/7378 [22:04:14<3:10:00, 12.10s/it] + 87%|████████▋ | 6437/7378 [22:04:26<3:09:34, 12.09s/it] + +{'loss': 0.4164, 'learning_rate': 8.412460533128964e-07, 'epoch': 0.87} + + 87%|████████▋ | 6437/7378 [22:04:26<3:09:34, 12.09s/it] + 87%|████████▋ | 6438/7378 [22:04:38<3:08:07, 12.01s/it] + +{'loss': 0.4379, 'learning_rate': 8.394844539713586e-07, 'epoch': 0.87} + + 87%|████████▋ | 6438/7378 [22:04:38<3:08:07, 12.01s/it] + 87%|████████▋ | 6439/7378 [22:04:50<3:09:33, 12.11s/it] + +{'loss': 0.4249, 'learning_rate': 8.377246201756306e-07, 'epoch': 0.87} + + 87%|████████▋ | 6439/7378 [22:04:50<3:09:33, 12.11s/it] + 87%|████████▋ | 6440/7378 [22:05:03<3:12:19, 12.30s/it] + +{'loss': 0.3988, 'learning_rate': 8.35966552264893e-07, 'epoch': 0.87} + + 87%|████████▋ | 6440/7378 [22:05:03<3:12:19, 12.30s/it] + 87%|████████▋ | 6441/7378 [22:05:15<3:13:05, 12.36s/it] + +{'loss': 0.4549, 'learning_rate': 8.34210250577987e-07, 'epoch': 0.87} + + 87%|████████▋ | 6441/7378 [22:05:15<3:13:05, 12.36s/it] + 87%|████████▋ | 6442/7378 [22:05:28<3:12:23, 12.33s/it] + +{'loss': 0.4368, 'learning_rate': 8.32455715453413e-07, 'epoch': 0.87} + + 87%|████████▋ | 6442/7378 [22:05:28<3:12:23, 12.33s/it] + 87%|████████▋ | 6443/7378 [22:05:40<3:11:53, 12.31s/it] + +{'loss': 0.424, 'learning_rate': 8.307029472293271e-07, 'epoch': 0.87} + + 87%|████████▋ | 6443/7378 [22:05:40<3:11:53, 12.31s/it] + 87%|████████▋ | 6444/7378 [22:05:52<3:10:30, 12.24s/it] + +{'loss': 0.4865, 'learning_rate': 8.289519462435502e-07, 'epoch': 0.87} + + 87%|████████▋ | 6444/7378 [22:05:52<3:10:30, 12.24s/it] + 87%|████████▋ | 6445/7378 [22:06:04<3:10:18, 12.24s/it] + +{'loss': 0.3779, 'learning_rate': 8.272027128335602e-07, 'epoch': 0.87} + + 87%|████████▋ | 6445/7378 [22:06:04<3:10:18, 12.24s/it] + 87%|████████▋ | 6446/7378 [22:06:16<3:07:59, 12.10s/it] + +{'loss': 0.4354, 'learning_rate': 8.254552473364952e-07, 'epoch': 0.87} + + 87%|████████▋ | 6446/7378 [22:06:16<3:07:59, 12.10s/it] + 87%|████████▋ | 6447/7378 [22:06:28<3:08:21, 12.14s/it] + +{'loss': 0.4436, 'learning_rate': 8.237095500891479e-07, 'epoch': 0.87} + + 87%|████████▋ | 6447/7378 [22:06:28<3:08:21, 12.14s/it] + 87%|████████▋ | 6448/7378 [22:06:40<3:07:52, 12.12s/it] + +{'loss': 0.4197, 'learning_rate': 8.21965621427978e-07, 'epoch': 0.87} + + 87%|████████▋ | 6448/7378 [22:06:40<3:07:52, 12.12s/it] + 87%|████████▋ | 6449/7378 [22:06:53<3:10:28, 12.30s/it] + +{'loss': 0.4613, 'learning_rate': 8.202234616891002e-07, 'epoch': 0.87} + + 87%|████████▋ | 6449/7378 [22:06:53<3:10:28, 12.30s/it] + 87%|████████▋ | 6450/7378 [22:07:06<3:11:24, 12.38s/it] + +{'loss': 0.3903, 'learning_rate': 8.18483071208287e-07, 'epoch': 0.87} + + 87%|████████▋ | 6450/7378 [22:07:06<3:11:24, 12.38s/it] + 87%|████████▋ | 6451/7378 [22:07:18<3:11:59, 12.43s/it] + +{'loss': 0.4102, 'learning_rate': 8.167444503209721e-07, 'epoch': 0.87} + + 87%|████████▋ | 6451/7378 [22:07:18<3:11:59, 12.43s/it] + 87%|████████▋ | 6452/7378 [22:07:30<3:10:21, 12.33s/it] + +{'loss': 0.4258, 'learning_rate': 8.150075993622452e-07, 'epoch': 0.87} + + 87%|████████▋ | 6452/7378 [22:07:30<3:10:21, 12.33s/it] + 87%|████████▋ | 6453/7378 [22:07:42<3:08:04, 12.20s/it] + +{'loss': 0.4446, 'learning_rate': 8.13272518666861e-07, 'epoch': 0.87} + + 87%|████████▋ | 6453/7378 [22:07:42<3:08:04, 12.20s/it] + 87%|████████▋ | 6454/7378 [22:07:55<3:09:58, 12.34s/it] + +{'loss': 0.4439, 'learning_rate': 8.115392085692275e-07, 'epoch': 0.87} + + 87%|████████▋ | 6454/7378 [22:07:55<3:09:58, 12.34s/it] + 87%|████████▋ | 6455/7378 [22:08:07<3:09:58, 12.35s/it] + +{'loss': 0.3847, 'learning_rate': 8.098076694034129e-07, 'epoch': 0.87} + + 87%|████████▋ | 6455/7378 [22:08:07<3:09:58, 12.35s/it] + 88%|████████▊ | 6456/7378 [22:08:19<3:09:26, 12.33s/it] + +{'loss': 0.4269, 'learning_rate': 8.080779015031426e-07, 'epoch': 0.88} + + 88%|████████▊ | 6456/7378 [22:08:19<3:09:26, 12.33s/it] + 88%|████████▊ | 6457/7378 [22:08:31<3:08:01, 12.25s/it] + +{'loss': 0.4419, 'learning_rate': 8.063499052018042e-07, 'epoch': 0.88} + + 88%|████████▊ | 6457/7378 [22:08:31<3:08:01, 12.25s/it] + 88%|████████▊ | 6458/7378 [22:08:44<3:07:14, 12.21s/it] + +{'loss': 0.4513, 'learning_rate': 8.046236808324426e-07, 'epoch': 0.88} + + 88%|████████▊ | 6458/7378 [22:08:44<3:07:14, 12.21s/it] + 88%|████████▊ | 6459/7378 [22:08:56<3:08:04, 12.28s/it] + +{'loss': 0.3742, 'learning_rate': 8.028992287277593e-07, 'epoch': 0.88} + + 88%|████████▊ | 6459/7378 [22:08:56<3:08:04, 12.28s/it] + 88%|████████▊ | 6460/7378 [22:09:09<3:11:04, 12.49s/it] + +{'loss': 0.4685, 'learning_rate': 8.011765492201151e-07, 'epoch': 0.88} + + 88%|████████▊ | 6460/7378 [22:09:09<3:11:04, 12.49s/it] + 88%|████████▊ | 6461/7378 [22:09:22<3:10:50, 12.49s/it] + +{'loss': 0.42, 'learning_rate': 7.994556426415279e-07, 'epoch': 0.88} + + 88%|████████▊ | 6461/7378 [22:09:22<3:10:50, 12.49s/it] + 88%|████████▊ | 6462/7378 [22:09:34<3:10:49, 12.50s/it] + +{'loss': 0.4426, 'learning_rate': 7.9773650932368e-07, 'epoch': 0.88} + + 88%|████████▊ | 6462/7378 [22:09:34<3:10:49, 12.50s/it] + 88%|████████▊ | 6463/7378 [22:09:46<3:09:35, 12.43s/it] + +{'loss': 0.4623, 'learning_rate': 7.960191495979041e-07, 'epoch': 0.88} + + 88%|████████▊ | 6463/7378 [22:09:46<3:09:35, 12.43s/it] + 88%|████████▊ | 6464/7378 [22:09:58<3:06:45, 12.26s/it] + +{'loss': 0.4323, 'learning_rate': 7.943035637951957e-07, 'epoch': 0.88} + + 88%|████████▊ | 6464/7378 [22:09:58<3:06:45, 12.26s/it] + 88%|████████▊ | 6465/7378 [22:10:10<3:05:45, 12.21s/it] + +{'loss': 0.4333, 'learning_rate': 7.925897522462045e-07, 'epoch': 0.88} + + 88%|████████▊ | 6465/7378 [22:10:10<3:05:45, 12.21s/it] + 88%|████████▊ | 6466/7378 [22:10:22<3:05:06, 12.18s/it] + +{'loss': 0.4629, 'learning_rate': 7.908777152812452e-07, 'epoch': 0.88} + + 88%|████████▊ | 6466/7378 [22:10:22<3:05:06, 12.18s/it] + 88%|████████▊ | 6467/7378 [22:10:35<3:05:52, 12.24s/it] + +{'loss': 0.4201, 'learning_rate': 7.891674532302828e-07, 'epoch': 0.88} + + 88%|████████▊ | 6467/7378 [22:10:35<3:05:52, 12.24s/it] + 88%|████████▊ | 6468/7378 [22:10:47<3:06:13, 12.28s/it] + +{'loss': 0.4508, 'learning_rate': 7.874589664229448e-07, 'epoch': 0.88} + + 88%|████████▊ | 6468/7378 [22:10:47<3:06:13, 12.28s/it] + 88%|████████▊ | 6469/7378 [22:10:59<3:05:30, 12.24s/it] + +{'loss': 0.4693, 'learning_rate': 7.857522551885155e-07, 'epoch': 0.88} + + 88%|████████▊ | 6469/7378 [22:10:59<3:05:30, 12.24s/it] + 88%|████████▊ | 6470/7378 [22:11:12<3:05:27, 12.26s/it] + +{'loss': 0.4233, 'learning_rate': 7.840473198559339e-07, 'epoch': 0.88} + + 88%|████████▊ | 6470/7378 [22:11:12<3:05:27, 12.26s/it] + 88%|████████▊ | 6471/7378 [22:11:24<3:05:29, 12.27s/it] + +{'loss': 0.4235, 'learning_rate': 7.823441607538029e-07, 'epoch': 0.88} + + 88%|████████▊ | 6471/7378 [22:11:24<3:05:29, 12.27s/it] + 88%|████████▊ | 6472/7378 [22:11:36<3:04:04, 12.19s/it] + +{'loss': 0.4608, 'learning_rate': 7.806427782103798e-07, 'epoch': 0.88} + + 88%|████████▊ | 6472/7378 [22:11:36<3:04:04, 12.19s/it] + 88%|████████▊ | 6473/7378 [22:11:48<3:05:03, 12.27s/it] + +{'loss': 0.4162, 'learning_rate': 7.789431725535768e-07, 'epoch': 0.88} + + 88%|████████▊ | 6473/7378 [22:11:48<3:05:03, 12.27s/it] + 88%|████████▊ | 6474/7378 [22:12:01<3:04:33, 12.25s/it] + +{'loss': 0.3957, 'learning_rate': 7.772453441109674e-07, 'epoch': 0.88} + + 88%|████████▊ | 6474/7378 [22:12:01<3:04:33, 12.25s/it] + 88%|████████▊ | 6475/7378 [22:12:13<3:04:12, 12.24s/it] + +{'loss': 0.445, 'learning_rate': 7.75549293209783e-07, 'epoch': 0.88} + + 88%|███��████▊ | 6475/7378 [22:12:13<3:04:12, 12.24s/it] + 88%|████████▊ | 6476/7378 [22:12:25<3:04:31, 12.27s/it] + +{'loss': 0.4736, 'learning_rate': 7.738550201769091e-07, 'epoch': 0.88} + + 88%|████████▊ | 6476/7378 [22:12:25<3:04:31, 12.27s/it] + 88%|████████▊ | 6477/7378 [22:12:37<3:03:23, 12.21s/it] + +{'loss': 0.4111, 'learning_rate': 7.721625253388909e-07, 'epoch': 0.88} + + 88%|████████▊ | 6477/7378 [22:12:37<3:03:23, 12.21s/it] + 88%|████████▊ | 6478/7378 [22:12:49<3:03:08, 12.21s/it] + +{'loss': 0.4068, 'learning_rate': 7.704718090219299e-07, 'epoch': 0.88} + + 88%|████████▊ | 6478/7378 [22:12:49<3:03:08, 12.21s/it] + 88%|████████▊ | 6479/7378 [22:13:02<3:03:25, 12.24s/it] + +{'loss': 0.4467, 'learning_rate': 7.687828715518842e-07, 'epoch': 0.88} + + 88%|████████▊ | 6479/7378 [22:13:02<3:03:25, 12.24s/it] + 88%|████████▊ | 6480/7378 [22:13:14<3:03:24, 12.25s/it] + +{'loss': 0.3629, 'learning_rate': 7.670957132542722e-07, 'epoch': 0.88} + + 88%|████████▊ | 6480/7378 [22:13:14<3:03:24, 12.25s/it] + 88%|████████▊ | 6481/7378 [22:13:27<3:04:52, 12.37s/it] + +{'loss': 0.4106, 'learning_rate': 7.654103344542674e-07, 'epoch': 0.88} + + 88%|████████▊ | 6481/7378 [22:13:27<3:04:52, 12.37s/it] + 88%|████████▊ | 6482/7378 [22:13:39<3:04:11, 12.33s/it] + +{'loss': 0.4608, 'learning_rate': 7.637267354766975e-07, 'epoch': 0.88} + + 88%|████████▊ | 6482/7378 [22:13:39<3:04:11, 12.33s/it] + 88%|████████▊ | 6483/7378 [22:13:51<3:03:57, 12.33s/it] + +{'loss': 0.4675, 'learning_rate': 7.62044916646052e-07, 'epoch': 0.88} + + 88%|████████▊ | 6483/7378 [22:13:51<3:03:57, 12.33s/it] + 88%|████████▊ | 6484/7378 [22:14:03<3:03:23, 12.31s/it] + +{'loss': 0.4314, 'learning_rate': 7.603648782864714e-07, 'epoch': 0.88} + + 88%|████████▊ | 6484/7378 [22:14:03<3:03:23, 12.31s/it] + 88%|████████▊ | 6485/7378 [22:14:16<3:04:28, 12.39s/it] + +{'loss': 0.3993, 'learning_rate': 7.586866207217625e-07, 'epoch': 0.88} + + 88%|████████▊ | 6485/7378 [22:14:16<3:04:28, 12.39s/it] + 88%|████████▊ | 6486/7378 [22:14:28<3:02:29, 12.28s/it] + +{'loss': 0.4675, 'learning_rate': 7.570101442753808e-07, 'epoch': 0.88} + + 88%|████████▊ | 6486/7378 [22:14:28<3:02:29, 12.28s/it] + 88%|████████▊ | 6487/7378 [22:14:40<3:01:46, 12.24s/it] + +{'loss': 0.4341, 'learning_rate': 7.553354492704401e-07, 'epoch': 0.88} + + 88%|████████▊ | 6487/7378 [22:14:40<3:01:46, 12.24s/it] + 88%|████████▊ | 6488/7378 [22:14:52<3:00:41, 12.18s/it] + +{'loss': 0.4498, 'learning_rate': 7.536625360297122e-07, 'epoch': 0.88} + + 88%|████████▊ | 6488/7378 [22:14:52<3:00:41, 12.18s/it] + 88%|████████▊ | 6489/7378 [22:15:05<3:02:03, 12.29s/it] + +{'loss': 0.4291, 'learning_rate': 7.519914048756238e-07, 'epoch': 0.88} + + 88%|████████▊ | 6489/7378 [22:15:05<3:02:03, 12.29s/it] + 88%|████████▊ | 6490/7378 [22:15:17<3:02:32, 12.33s/it] + +{'loss': 0.3959, 'learning_rate': 7.503220561302604e-07, 'epoch': 0.88} + + 88%|████████▊ | 6490/7378 [22:15:17<3:02:32, 12.33s/it] + 88%|████████▊ | 6491/7378 [22:15:30<3:03:12, 12.39s/it] + +{'loss': 0.4354, 'learning_rate': 7.486544901153637e-07, 'epoch': 0.88} + + 88%|████████▊ | 6491/7378 [22:15:30<3:03:12, 12.39s/it] + 88%|████████▊ | 6492/7378 [22:15:42<3:03:03, 12.40s/it] + +{'loss': 0.4476, 'learning_rate': 7.469887071523297e-07, 'epoch': 0.88} + + 88%|████████▊ | 6492/7378 [22:15:42<3:03:03, 12.40s/it] + 88%|████████▊ | 6493/7378 [22:15:54<3:01:02, 12.27s/it] + +{'loss': 0.4393, 'learning_rate': 7.453247075622117e-07, 'epoch': 0.88} + + 88%|████████▊ | 6493/7378 [22:15:54<3:01:02, 12.27s/it] + 88%|████████▊ | 6494/7378 [22:16:06<3:00:20, 12.24s/it] + +{'loss': 0.4001, 'learning_rate': 7.436624916657176e-07, 'epoch': 0.88} + + 88%|████████▊ | 6494/7378 [22:16:06<3:00:20, 12.24s/it] + 88%|████████▊ | 6495/7378 [22:16:19<2:59:58, 12.23s/it] + +{'loss': 0.4087, 'learning_rate': 7.420020597832178e-07, 'epoch': 0.88} + + 88%|████████▊ | 6495/7378 [22:16:19<2:59:58, 12.23s/it] + 88%|████████▊ | 6496/7378 [22:16:31<2:58:44, 12.16s/it] + +{'loss': 0.4483, 'learning_rate': 7.40343412234733e-07, 'epoch': 0.88} + + 88%|████████▊ | 6496/7378 [22:16:31<2:58:44, 12.16s/it] + 88%|████████▊ | 6497/7378 [22:16:43<3:02:08, 12.40s/it] + +{'loss': 0.4274, 'learning_rate': 7.386865493399398e-07, 'epoch': 0.88} + + 88%|████████▊ | 6497/7378 [22:16:43<3:02:08, 12.40s/it] + 88%|████████▊ | 6498/7378 [22:16:56<3:03:06, 12.48s/it] + +{'loss': 0.4456, 'learning_rate': 7.370314714181726e-07, 'epoch': 0.88} + + 88%|████████▊ | 6498/7378 [22:16:56<3:03:06, 12.48s/it] + 88%|████████▊ | 6499/7378 [22:17:09<3:03:15, 12.51s/it] + +{'loss': 0.3417, 'learning_rate': 7.353781787884251e-07, 'epoch': 0.88} + + 88%|████████▊ | 6499/7378 [22:17:09<3:03:15, 12.51s/it] + 88%|████████▊ | 6500/7378 [22:17:21<3:01:46, 12.42s/it] + +{'loss': 0.3912, 'learning_rate': 7.337266717693414e-07, 'epoch': 0.88} + + 88%|████████▊ | 6500/7378 [22:17:21<3:01:46, 12.42s/it] + 88%|████████▊ | 6501/7378 [22:17:33<3:01:27, 12.41s/it] + +{'loss': 0.4127, 'learning_rate': 7.320769506792225e-07, 'epoch': 0.88} + + 88%|████████▊ | 6501/7378 [22:17:33<3:01:27, 12.41s/it] + 88%|████████▊ | 6502/7378 [22:17:45<2:58:56, 12.26s/it] + +{'loss': 0.3909, 'learning_rate': 7.304290158360283e-07, 'epoch': 0.88} + + 88%|████████▊ | 6502/7378 [22:17:45<2:58:56, 12.26s/it] + 88%|████████▊ | 6503/7378 [22:17:58<2:59:56, 12.34s/it] + +{'loss': 0.4152, 'learning_rate': 7.287828675573694e-07, 'epoch': 0.88} + + 88%|████████▊ | 6503/7378 [22:17:58<2:59:56, 12.34s/it] + 88%|████████▊ | 6504/7378 [22:18:10<3:00:22, 12.38s/it] + +{'loss': 0.4352, 'learning_rate': 7.271385061605185e-07, 'epoch': 0.88} + + 88%|████████▊ | 6504/7378 [22:18:10<3:00:22, 12.38s/it] + 88%|████████▊ | 6505/7378 [22:18:22<2:58:15, 12.25s/it] + +{'loss': 0.455, 'learning_rate': 7.254959319623989e-07, 'epoch': 0.88} + + 88%|████████▊ | 6505/7378 [22:18:22<2:58:15, 12.25s/it] + 88%|████████▊ | 6506/7378 [22:18:35<2:58:20, 12.27s/it] + +{'loss': 0.4593, 'learning_rate': 7.238551452795917e-07, 'epoch': 0.88} + + 88%|████████▊ | 6506/7378 [22:18:35<2:58:20, 12.27s/it] + 88%|████████▊ | 6507/7378 [22:18:47<2:58:04, 12.27s/it] + +{'loss': 0.5013, 'learning_rate': 7.222161464283307e-07, 'epoch': 0.88} + + 88%|████████▊ | 6507/7378 [22:18:47<2:58:04, 12.27s/it] + 88%|████████▊ | 6508/7378 [22:18:59<2:58:29, 12.31s/it] + +{'loss': 0.4183, 'learning_rate': 7.205789357245097e-07, 'epoch': 0.88} + + 88%|████████▊ | 6508/7378 [22:18:59<2:58:29, 12.31s/it] + 88%|████████▊ | 6509/7378 [22:19:12<2:59:26, 12.39s/it] + +{'loss': 0.4692, 'learning_rate': 7.189435134836753e-07, 'epoch': 0.88} + + 88%|████████▊ | 6509/7378 [22:19:12<2:59:26, 12.39s/it] + 88%|████████▊ | 6510/7378 [22:19:24<2:57:53, 12.30s/it] + +{'loss': 0.3203, 'learning_rate': 7.173098800210287e-07, 'epoch': 0.88} + + 88%|████████▊ | 6510/7378 [22:19:24<2:57:53, 12.30s/it] + 88%|████████▊ | 6511/7378 [22:19:36<2:55:30, 12.15s/it] + +{'loss': 0.4419, 'learning_rate': 7.15678035651427e-07, 'epoch': 0.88} + + 88%|████████▊ | 6511/7378 [22:19:36<2:55:30, 12.15s/it] + 88%|████████▊ | 6512/7378 [22:19:48<2:54:34, 12.09s/it] + +{'loss': 0.3708, 'learning_rate': 7.140479806893818e-07, 'epoch': 0.88} + + 88%|████████▊ | 6512/7378 [22:19:48<2:54:34, 12.09s/it] + 88%|████████▊ | 6513/7378 [22:19:59<2:52:50, 11.99s/it] + +{'loss': 0.4537, 'learning_rate': 7.124197154490631e-07, 'epoch': 0.88} + + 88%|████████▊ | 6513/7378 [22:19:59<2:52:50, 11.99s/it] + 88%|████████▊ | 6514/7378 [22:20:11<2:52:25, 11.97s/it] + +{'loss': 0.4364, 'learning_rate': 7.107932402442919e-07, 'epoch': 0.88} + + 88%|████████▊ | 6514/7378 [22:20:11<2:52:25, 11.97s/it] + 88%|████████▊ | 6515/7378 [22:20:24<2:55:48, 12.22s/it] + +{'loss': 0.4384, 'learning_rate': 7.091685553885464e-07, 'epoch': 0.88} + + 88%|████████▊ | 6515/7378 [22:20:24<2:55:48, 12.22s/it] + 88%|████████▊ | 6516/7378 [22:20:36<2:56:07, 12.26s/it] + +{'loss': 0.4116, 'learning_rate': 7.075456611949572e-07, 'epoch': 0.88} + + 88%|████████▊ | 6516/7378 [22:20:36<2:56:07, 12.26s/it] + 88%|████████▊ | 6517/7378 [22:20:49<2:55:41, 12.24s/it] + +{'loss': 0.4424, 'learning_rate': 7.059245579763141e-07, 'epoch': 0.88} + + 88%|████████▊ | 6517/7378 [22:20:49<2:55:41, 12.24s/it] + 88%|████████▊ | 6518/7378 [22:21:01<2:54:25, 12.17s/it] + +{'loss': 0.4663, 'learning_rate': 7.043052460450595e-07, 'epoch': 0.88} + + 88%|████████▊ | 6518/7378 [22:21:01<2:54:25, 12.17s/it] + 88%|████████▊ | 6519/7378 [22:21:13<2:54:42, 12.20s/it] + +{'loss': 0.4011, 'learning_rate': 7.026877257132891e-07, 'epoch': 0.88} + + 88%|████████▊ | 6519/7378 [22:21:13<2:54:42, 12.20s/it] + 88%|████████▊ | 6520/7378 [22:21:26<2:56:43, 12.36s/it] + +{'loss': 0.4924, 'learning_rate': 7.010719972927549e-07, 'epoch': 0.88} + + 88%|████████▊ | 6520/7378 [22:21:26<2:56:43, 12.36s/it] + 88%|████████▊ | 6521/7378 [22:21:38<2:57:09, 12.40s/it] + +{'loss': 0.4787, 'learning_rate': 6.99458061094861e-07, 'epoch': 0.88} + + 88%|████████▊ | 6521/7378 [22:21:38<2:57:09, 12.40s/it] + 88%|████████▊ | 6522/7378 [22:21:51<2:57:04, 12.41s/it] + +{'loss': 0.4058, 'learning_rate': 6.978459174306729e-07, 'epoch': 0.88} + + 88%|████████▊ | 6522/7378 [22:21:51<2:57:04, 12.41s/it] + 88%|████████▊ | 6523/7378 [22:22:03<2:56:20, 12.37s/it] + +{'loss': 0.4417, 'learning_rate': 6.962355666109033e-07, 'epoch': 0.88} + + 88%|████████▊ | 6523/7378 [22:22:03<2:56:20, 12.37s/it] + 88%|████████▊ | 6524/7378 [22:22:15<2:55:15, 12.31s/it] + +{'loss': 0.3703, 'learning_rate': 6.946270089459228e-07, 'epoch': 0.88} + + 88%|████████▊ | 6524/7378 [22:22:15<2:55:15, 12.31s/it] + 88%|████████▊ | 6525/7378 [22:22:27<2:54:54, 12.30s/it] + +{'loss': 0.4207, 'learning_rate': 6.930202447457535e-07, 'epoch': 0.88} + + 88%|████████▊ | 6525/7378 [22:22:27<2:54:54, 12.30s/it] + 88%|████████▊ | 6526/7378 [22:22:40<2:56:13, 12.41s/it] + +{'loss': 0.4449, 'learning_rate': 6.914152743200775e-07, 'epoch': 0.88} + + 88%|████████▊ | 6526/7378 [22:22:40<2:56:13, 12.41s/it] + 88%|████████▊ | 6527/7378 [22:22:52<2:55:57, 12.41s/it] + +{'loss': 0.4519, 'learning_rate': 6.898120979782264e-07, 'epoch': 0.88} + + 88%|████████▊ | 6527/7378 [22:22:52<2:55:57, 12.41s/it] + 88%|████████▊ | 6528/7378 [22:23:05<2:55:24, 12.38s/it] + +{'loss': 0.4573, 'learning_rate': 6.882107160291851e-07, 'epoch': 0.88} + + 88%|████████▊ | 6528/7378 [22:23:05<2:55:24, 12.38s/it] + 88%|████████▊ | 6529/7378 [22:23:17<2:54:18, 12.32s/it] + +{'loss': 0.5108, 'learning_rate': 6.866111287815991e-07, 'epoch': 0.88} + + 88%|████████▊ | 6529/7378 [22:23:17<2:54:18, 12.32s/it] + 89%|████████▊ | 6530/7378 [22:23:29<2:54:39, 12.36s/it] + +{'loss': 0.3967, 'learning_rate': 6.850133365437605e-07, 'epoch': 0.89} + + 89%|████████▊ | 6530/7378 [22:23:29<2:54:39, 12.36s/it] + 89%|████████▊ | 6531/7378 [22:23:42<2:54:49, 12.38s/it] + +{'loss': 0.4617, 'learning_rate': 6.834173396236188e-07, 'epoch': 0.89} + + 89%|████████▊ | 6531/7378 [22:23:42<2:54:49, 12.38s/it] + 89%|████████▊ | 6532/7378 [22:23:54<2:54:01, 12.34s/it] + +{'loss': 0.4015, 'learning_rate': 6.818231383287788e-07, 'epoch': 0.89} + + 89%|████████▊ | 6532/7378 [22:23:54<2:54:01, 12.34s/it] + 89%|████████▊ | 6533/7378 [22:24:06<2:52:48, 12.27s/it] + +{'loss': 0.4568, 'learning_rate': 6.802307329664981e-07, 'epoch': 0.89} + + 89%|████████▊ | 6533/7378 [22:24:06<2:52:48, 12.27s/it] + 89%|████████▊ | 6534/7378 [22:24:18<2:52:48, 12.29s/it] + +{'loss': 0.4269, 'learning_rate': 6.786401238436869e-07, 'epoch': 0.89} + + 89%|████████▊ | 6534/7378 [22:24:18<2:52:48, 12.29s/it] + 89%|████████▊ | 6535/7378 [22:24:31<2:52:50, 12.30s/it] + +{'loss': 0.4664, 'learning_rate': 6.770513112669086e-07, 'epoch': 0.89} + + 89%|████████▊ | 6535/7378 [22:24:31<2:52:50, 12.30s/it] + 89%|████████▊ | 6536/7378 [22:24:43<2:50:49, 12.17s/it] + +{'loss': 0.4308, 'learning_rate': 6.754642955423852e-07, 'epoch': 0.89} + + 89%|████████▊ | 6536/7378 [22:24:43<2:50:49, 12.17s/it] + 89%|████████▊ | 6537/7378 [22:24:55<2:52:17, 12.29s/it] + +{'loss': 0.4411, 'learning_rate': 6.738790769759873e-07, 'epoch': 0.89} + + 89%|████████▊ | 6537/7378 [22:24:55<2:52:17, 12.29s/it] + 89%|████████▊ | 6538/7378 [22:25:07<2:51:55, 12.28s/it] + +{'loss': 0.3951, 'learning_rate': 6.722956558732419e-07, 'epoch': 0.89} + + 89%|████████▊ | 6538/7378 [22:25:07<2:51:55, 12.28s/it] + 89%|████████▊ | 6539/7378 [22:25:20<2:52:42, 12.35s/it] + +{'loss': 0.3966, 'learning_rate': 6.707140325393269e-07, 'epoch': 0.89} + + 89%|████████▊ | 6539/7378 [22:25:20<2:52:42, 12.35s/it] + 89%|████████▊ | 6540/7378 [22:25:32<2:52:12, 12.33s/it] + +{'loss': 0.4184, 'learning_rate': 6.691342072790763e-07, 'epoch': 0.89} + + 89%|████████▊ | 6540/7378 [22:25:32<2:52:12, 12.33s/it] + 89%|████████▊ | 6541/7378 [22:25:45<2:51:50, 12.32s/it] + +{'loss': 0.4551, 'learning_rate': 6.675561803969765e-07, 'epoch': 0.89} + + 89%|████████▊ | 6541/7378 [22:25:45<2:51:50, 12.32s/it] + 89%|████████▊ | 6542/7378 [22:25:56<2:49:50, 12.19s/it] + +{'loss': 0.4377, 'learning_rate': 6.659799521971688e-07, 'epoch': 0.89} + + 89%|████████▊ | 6542/7378 [22:25:56<2:49:50, 12.19s/it] + 89%|████████▊ | 6543/7378 [22:26:08<2:48:53, 12.14s/it] + +{'loss': 0.3957, 'learning_rate': 6.644055229834457e-07, 'epoch': 0.89} + + 89%|████████▊ | 6543/7378 [22:26:08<2:48:53, 12.14s/it] + 89%|████████▊ | 6544/7378 [22:26:21<2:48:48, 12.14s/it] + +{'loss': 0.4251, 'learning_rate': 6.628328930592532e-07, 'epoch': 0.89} + + 89%|████████▊ | 6544/7378 [22:26:21<2:48:48, 12.14s/it] + 89%|████████▊ | 6545/7378 [22:26:33<2:49:21, 12.20s/it] + +{'loss': 0.4529, 'learning_rate': 6.612620627276889e-07, 'epoch': 0.89} + + 89%|████████▊ | 6545/7378 [22:26:33<2:49:21, 12.20s/it] + 89%|████████▊ | 6546/7378 [22:26:45<2:49:06, 12.20s/it] + +{'loss': 0.4024, 'learning_rate': 6.596930322915107e-07, 'epoch': 0.89} + + 89%|████████▊ | 6546/7378 [22:26:45<2:49:06, 12.20s/it] + 89%|████████▊ | 6547/7378 [22:26:57<2:49:04, 12.21s/it] + +{'loss': 0.4384, 'learning_rate': 6.581258020531223e-07, 'epoch': 0.89} + + 89%|████████▊ | 6547/7378 [22:26:57<2:49:04, 12.21s/it] + 89%|████████▉ | 6548/7378 [22:27:09<2:48:27, 12.18s/it] + +{'loss': 0.446, 'learning_rate': 6.565603723145819e-07, 'epoch': 0.89} + + 89%|████████▉ | 6548/7378 [22:27:09<2:48:27, 12.18s/it] + 89%|████████▉ | 6549/7378 [22:27:22<2:48:27, 12.19s/it] + +{'loss': 0.4573, 'learning_rate': 6.549967433776005e-07, 'epoch': 0.89} + + 89%|████████▉ | 6549/7378 [22:27:22<2:48:27, 12.19s/it] + 89%|████████▉ | 6550/7378 [22:27:34<2:50:14, 12.34s/it] + +{'loss': 0.4112, 'learning_rate': 6.534349155435471e-07, 'epoch': 0.89} + + 89%|████████▉ | 6550/7378 [22:27:34<2:50:14, 12.34s/it] + 89%|████████▉ | 6551/7378 [22:27:47<2:49:18, 12.28s/it] + +{'loss': 0.4709, 'learning_rate': 6.518748891134364e-07, 'epoch': 0.89} + + 89%|████████▉ | 6551/7378 [22:27:47<2:49:18, 12.28s/it] + 89%|████████▉ | 6552/7378 [22:28:00<2:52:34, 12.54s/it] + +{'loss': 0.4702, 'learning_rate': 6.5031666438794e-07, 'epoch': 0.89} + + 89%|████████▉ | 6552/7378 [22:28:00<2:52:34, 12.54s/it] + 89%|████████▉ | 6553/7378 [22:28:12<2:50:29, 12.40s/it] + +{'loss': 0.3658, 'learning_rate': 6.487602416673811e-07, 'epoch': 0.89} + + 89%|████████▉ | 6553/7378 [22:28:12<2:50:29, 12.40s/it] + 89%|████████▉ | 6554/7378 [22:28:24<2:48:40, 12.28s/it] + +{'loss': 0.5167, 'learning_rate': 6.472056212517352e-07, 'epoch': 0.89} + + 89%|████████▉ | 6554/7378 [22:28:24<2:48:40, 12.28s/it] + 89%|████████▉ | 6555/7378 [22:28:36<2:48:09, 12.26s/it] + +{'loss': 0.4387, 'learning_rate': 6.456528034406317e-07, 'epoch': 0.89} + + 89%|████████▉ | 6555/7378 [22:28:36<2:48:09, 12.26s/it] + 89%|████████▉ | 6556/7378 [22:28:48<2:49:02, 12.34s/it] + +{'loss': 0.4553, 'learning_rate': 6.441017885333534e-07, 'epoch': 0.89} + + 89%|████████▉ | 6556/7378 [22:28:48<2:49:02, 12.34s/it] + 89%|████████▉ | 6557/7378 [22:29:00<2:47:31, 12.24s/it] + +{'loss': 0.4078, 'learning_rate': 6.42552576828831e-07, 'epoch': 0.89} + + 89%|████████▉ | 6557/7378 [22:29:00<2:47:31, 12.24s/it] + 89%|████████▉ | 6558/7378 [22:29:13<2:46:32, 12.19s/it] + +{'loss': 0.4344, 'learning_rate': 6.410051686256524e-07, 'epoch': 0.89} + + 89%|████████▉ | 6558/7378 [22:29:13<2:46:32, 12.19s/it] + 89%|████████▉ | 6559/7378 [22:29:25<2:48:08, 12.32s/it] + +{'loss': 0.4276, 'learning_rate': 6.394595642220569e-07, 'epoch': 0.89} + + 89%|████████▉ | 6559/7378 [22:29:25<2:48:08, 12.32s/it] + 89%|████████▉ | 6560/7378 [22:29:37<2:47:17, 12.27s/it] + +{'loss': 0.4676, 'learning_rate': 6.37915763915935e-07, 'epoch': 0.89} + + 89%|████████▉ | 6560/7378 [22:29:37<2:47:17, 12.27s/it] + 89%|████████▉ | 6561/7378 [22:29:50<2:46:48, 12.25s/it] + +{'loss': 0.412, 'learning_rate': 6.363737680048299e-07, 'epoch': 0.89} + + 89%|████████▉ | 6561/7378 [22:29:50<2:46:48, 12.25s/it] + 89%|████████▉ | 6562/7378 [22:30:01<2:45:22, 12.16s/it] + +{'loss': 0.4529, 'learning_rate': 6.348335767859371e-07, 'epoch': 0.89} + + 89%|████████▉ | 6562/7378 [22:30:01<2:45:22, 12.16s/it] + 89%|████████▉ | 6563/7378 [22:30:14<2:44:42, 12.13s/it] + +{'loss': 0.4205, 'learning_rate': 6.332951905561025e-07, 'epoch': 0.89} + + 89%|████████▉ | 6563/7378 [22:30:14<2:44:42, 12.13s/it] + 89%|████████▉ | 6564/7378 [22:30:26<2:44:15, 12.11s/it] + +{'loss': 0.4255, 'learning_rate': 6.3175860961183e-07, 'epoch': 0.89} + + 89%|████████▉ | 6564/7378 [22:30:26<2:44:15, 12.11s/it] + 89%|████████▉ | 6565/7378 [22:30:38<2:46:09, 12.26s/it] + +{'loss': 0.4217, 'learning_rate': 6.302238342492683e-07, 'epoch': 0.89} + + 89%|████████▉ | 6565/7378 [22:30:38<2:46:09, 12.26s/it] + 89%|████████▉ | 6566/7378 [22:30:51<2:47:07, 12.35s/it] + +{'loss': 0.4594, 'learning_rate': 6.286908647642231e-07, 'epoch': 0.89} + + 89%|████████▉ | 6566/7378 [22:30:51<2:47:07, 12.35s/it] + 89%|████████▉ | 6567/7378 [22:31:03<2:45:22, 12.23s/it] + +{'loss': 0.3818, 'learning_rate': 6.27159701452148e-07, 'epoch': 0.89} + + 89%|████████▉ | 6567/7378 [22:31:03<2:45:22, 12.23s/it] + 89%|████████▉ | 6568/7378 [22:31:15<2:44:20, 12.17s/it] + +{'loss': 0.4305, 'learning_rate': 6.256303446081535e-07, 'epoch': 0.89} + + 89%|████████▉ | 6568/7378 [22:31:15<2:44:20, 12.17s/it] + 89%|████████▉ | 6569/7378 [22:31:27<2:43:11, 12.10s/it] + +{'loss': 0.4717, 'learning_rate': 6.241027945269973e-07, 'epoch': 0.89} + + 89%|████████▉ | 6569/7378 [22:31:27<2:43:11, 12.10s/it] + 89%|████████▉ | 6570/7378 [22:31:39<2:42:46, 12.09s/it] + +{'loss': 0.4083, 'learning_rate': 6.225770515030916e-07, 'epoch': 0.89} + + 89%|████████▉ | 6570/7378 [22:31:39<2:42:46, 12.09s/it] + 89%|████████▉ | 6571/7378 [22:31:51<2:42:41, 12.10s/it] + +{'loss': 0.468, 'learning_rate': 6.210531158304977e-07, 'epoch': 0.89} + + 89%|████████▉ | 6571/7378 [22:31:51<2:42:41, 12.10s/it] + 89%|████████▉ | 6572/7378 [22:32:03<2:43:18, 12.16s/it] + +{'loss': 0.4857, 'learning_rate': 6.195309878029332e-07, 'epoch': 0.89} + + 89%|████████▉ | 6572/7378 [22:32:03<2:43:18, 12.16s/it] + 89%|████████▉ | 6573/7378 [22:32:15<2:43:34, 12.19s/it] + +{'loss': 0.45, 'learning_rate': 6.18010667713761e-07, 'epoch': 0.89} + + 89%|████████▉ | 6573/7378 [22:32:15<2:43:34, 12.19s/it] + 89%|████████▉ | 6574/7378 [22:32:28<2:43:23, 12.19s/it] + +{'loss': 0.4613, 'learning_rate': 6.164921558560033e-07, 'epoch': 0.89} + + 89%|████████▉ | 6574/7378 [22:32:28<2:43:23, 12.19s/it] + 89%|████████▉ | 6575/7378 [22:32:40<2:42:19, 12.13s/it] + +{'loss': 0.3293, 'learning_rate': 6.149754525223262e-07, 'epoch': 0.89} + + 89%|████████▉ | 6575/7378 [22:32:40<2:42:19, 12.13s/it] + 89%|████████▉ | 6576/7378 [22:32:52<2:42:01, 12.12s/it] + +{'loss': 0.4856, 'learning_rate': 6.134605580050523e-07, 'epoch': 0.89} + + 89%|████████▉ | 6576/7378 [22:32:52<2:42:01, 12.12s/it] + 89%|████████▉ | 6577/7378 [22:33:04<2:43:03, 12.21s/it] + +{'loss': 0.413, 'learning_rate': 6.119474725961505e-07, 'epoch': 0.89} + + 89%|████████▉ | 6577/7378 [22:33:04<2:43:03, 12.21s/it] + 89%|████████▉ | 6578/7378 [22:33:16<2:41:38, 12.12s/it] + +{'loss': 0.4355, 'learning_rate': 6.104361965872485e-07, 'epoch': 0.89} + + 89%|████████▉ | 6578/7378 [22:33:16<2:41:38, 12.12s/it] + 89%|████████▉ | 6579/7378 [22:33:28<2:40:17, 12.04s/it] + +{'loss': 0.397, 'learning_rate': 6.089267302696189e-07, 'epoch': 0.89} + + 89%|████████▉ | 6579/7378 [22:33:28<2:40:17, 12.04s/it] + 89%|████████▉ | 6580/7378 [22:33:40<2:39:53, 12.02s/it] + +{'loss': 0.4156, 'learning_rate': 6.07419073934189e-07, 'epoch': 0.89} + + 89%|████████▉ | 6580/7378 [22:33:40<2:39:53, 12.02s/it] + 89%|████████▉ | 6581/7378 [22:33:52<2:40:03, 12.05s/it] + +{'loss': 0.398, 'learning_rate': 6.05913227871534e-07, 'epoch': 0.89} + + 89%|████████▉ | 6581/7378 [22:33:52<2:40:03, 12.05s/it] + 89%|████████▉ | 6582/7378 [22:34:04<2:41:34, 12.18s/it] + +{'loss': 0.4356, 'learning_rate': 6.04409192371882e-07, 'epoch': 0.89} + + 89%|████████▉ | 6582/7378 [22:34:04<2:41:34, 12.18s/it] + 89%|████████▉ | 6583/7378 [22:34:17<2:43:54, 12.37s/it] + +{'loss': 0.46, 'learning_rate': 6.029069677251143e-07, 'epoch': 0.89} + + 89%|████████▉ | 6583/7378 [22:34:17<2:43:54, 12.37s/it] + 89%|████████▉ | 6584/7378 [22:34:30<2:44:13, 12.41s/it] + +{'loss': 0.387, 'learning_rate': 6.014065542207603e-07, 'epoch': 0.89} + + 89%|████████▉ | 6584/7378 [22:34:30<2:44:13, 12.41s/it] + 89%|████████▉ | 6585/7378 [22:34:42<2:43:14, 12.35s/it] + +{'loss': 0.4203, 'learning_rate': 5.999079521480011e-07, 'epoch': 0.89} + + 89%|████████▉ | 6585/7378 [22:34:42<2:43:14, 12.35s/it] + 89%|████████▉ | 6586/7378 [22:34:54<2:42:25, 12.31s/it] + +{'loss': 0.4377, 'learning_rate': 5.984111617956678e-07, 'epoch': 0.89} + + 89%|████████▉ | 6586/7378 [22:34:54<2:42:25, 12.31s/it] + 89%|████████▉ | 6587/7378 [22:35:07<2:42:40, 12.34s/it] + +{'loss': 0.3688, 'learning_rate': 5.969161834522452e-07, 'epoch': 0.89} + + 89%|████████▉ | 6587/7378 [22:35:07<2:42:40, 12.34s/it] + 89%|████████▉ | 6588/7378 [22:35:19<2:41:50, 12.29s/it] + +{'loss': 0.4084, 'learning_rate': 5.954230174058662e-07, 'epoch': 0.89} + + 89%|████████▉ | 6588/7378 [22:35:19<2:41:50, 12.29s/it] + 89%|████████▉ | 6589/7378 [22:35:31<2:41:18, 12.27s/it] + +{'loss': 0.4649, 'learning_rate': 5.939316639443149e-07, 'epoch': 0.89} + + 89%|████████▉ | 6589/7378 [22:35:31<2:41:18, 12.27s/it] + 89%|████████▉ | 6590/7378 [22:35:43<2:41:03, 12.26s/it] + +{'loss': 0.4913, 'learning_rate': 5.92442123355027e-07, 'epoch': 0.89} + + 89%|████████▉ | 6590/7378 [22:35:43<2:41:03, 12.26s/it] + 89%|████████▉ | 6591/7378 [22:35:56<2:40:45, 12.26s/it] + +{'loss': 0.4272, 'learning_rate': 5.909543959250852e-07, 'epoch': 0.89} + + 89%|████████▉ | 6591/7378 [22:35:56<2:40:45, 12.26s/it] + 89%|████████▉ | 6592/7378 [22:36:08<2:41:13, 12.31s/it] + +{'loss': 0.452, 'learning_rate': 5.894684819412289e-07, 'epoch': 0.89} + + 89%|████████▉ | 6592/7378 [22:36:08<2:41:13, 12.31s/it] + 89%|████████▉ | 6593/7378 [22:36:20<2:40:49, 12.29s/it] + +{'loss': 0.4475, 'learning_rate': 5.879843816898445e-07, 'epoch': 0.89} + + 89%|████████▉ | 6593/7378 [22:36:20<2:40:49, 12.29s/it] + 89%|████████▉ | 6594/7378 [22:36:33<2:40:42, 12.30s/it] + +{'loss': 0.3808, 'learning_rate': 5.865020954569689e-07, 'epoch': 0.89} + + 89%|████████▉ | 6594/7378 [22:36:33<2:40:42, 12.30s/it] + 89%|████████▉ | 6595/7378 [22:36:45<2:40:33, 12.30s/it] + +{'loss': 0.4469, 'learning_rate': 5.850216235282858e-07, 'epoch': 0.89} + + 89%|████████▉ | 6595/7378 [22:36:45<2:40:33, 12.30s/it] + 89%|████████▉ | 6596/7378 [22:36:57<2:39:45, 12.26s/it] + +{'loss': 0.4559, 'learning_rate': 5.83542966189139e-07, 'epoch': 0.89} + + 89%|████████▉ | 6596/7378 [22:36:57<2:39:45, 12.26s/it] + 89%|████████▉ | 6597/7378 [22:37:09<2:40:04, 12.30s/it] + +{'loss': 0.4197, 'learning_rate': 5.820661237245128e-07, 'epoch': 0.89} + + 89%|████████▉ | 6597/7378 [22:37:09<2:40:04, 12.30s/it] + 89%|████████▉ | 6598/7378 [22:37:22<2:40:36, 12.35s/it] + +{'loss': 0.442, 'learning_rate': 5.805910964190465e-07, 'epoch': 0.89} + + 89%|████████▉ | 6598/7378 [22:37:22<2:40:36, 12.35s/it] + 89%|████████▉ | 6599/7378 [22:37:34<2:39:55, 12.32s/it] + +{'loss': 0.4575, 'learning_rate': 5.791178845570288e-07, 'epoch': 0.89} + + 89%|████████▉ | 6599/7378 [22:37:34<2:39:55, 12.32s/it] + 89%|████████▉ | 6600/7378 [22:37:46<2:39:38, 12.31s/it] + +{'loss': 0.4132, 'learning_rate': 5.776464884223954e-07, 'epoch': 0.89} + + 89%|████████▉ | 6600/7378 [22:37:46<2:39:38, 12.31s/it] + 89%|████████▉ | 6601/7378 [22:37:59<2:39:38, 12.33s/it] + +{'loss': 0.4251, 'learning_rate': 5.76176908298739e-07, 'epoch': 0.89} + + 89%|████████▉ | 6601/7378 [22:37:59<2:39:38, 12.33s/it] + 89%|████████▉ | 6602/7378 [22:38:11<2:40:00, 12.37s/it] + +{'loss': 0.4023, 'learning_rate': 5.747091444692953e-07, 'epoch': 0.89} + + 89%|████████▉ | 6602/7378 [22:38:11<2:40:00, 12.37s/it] + 89%|████████▉ | 6603/7378 [22:38:24<2:39:30, 12.35s/it] + +{'loss': 0.3876, 'learning_rate': 5.73243197216955e-07, 'epoch': 0.89} + + 89%|████████▉ | 6603/7378 [22:38:24<2:39:30, 12.35s/it] + 90%|████████▉ | 6604/7378 [22:38:36<2:40:30, 12.44s/it] + +{'loss': 0.4362, 'learning_rate': 5.717790668242551e-07, 'epoch': 0.9} + + 90%|████████▉ | 6604/7378 [22:38:36<2:40:30, 12.44s/it] + 90%|████████▉ | 6605/7378 [22:38:49<2:39:54, 12.41s/it] + +{'loss': 0.4743, 'learning_rate': 5.703167535733811e-07, 'epoch': 0.9} + + 90%|████████▉ | 6605/7378 [22:38:49<2:39:54, 12.41s/it] + 90%|████████▉ | 6606/7378 [22:39:01<2:41:17, 12.54s/it] + +{'loss': 0.4871, 'learning_rate': 5.688562577461765e-07, 'epoch': 0.9} + + 90%|████████▉ | 6606/7378 [22:39:01<2:41:17, 12.54s/it] + 90%|████████▉ | 6607/7378 [22:39:13<2:39:15, 12.39s/it] + +{'loss': 0.4549, 'learning_rate': 5.67397579624126e-07, 'epoch': 0.9} + + 90%|████████▉ | 6607/7378 [22:39:13<2:39:15, 12.39s/it] + 90%|████████▉ | 6608/7378 [22:39:26<2:38:33, 12.36s/it] + +{'loss': 0.4045, 'learning_rate': 5.659407194883671e-07, 'epoch': 0.9} + + 90%|████████▉ | 6608/7378 [22:39:26<2:38:33, 12.36s/it] + 90%|████████▉ | 6609/7378 [22:39:38<2:37:52, 12.32s/it] + +{'loss': 0.4728, 'learning_rate': 5.644856776196849e-07, 'epoch': 0.9} + + 90%|████████▉ | 6609/7378 [22:39:38<2:37:52, 12.32s/it] + 90%|████████▉ | 6610/7378 [22:39:50<2:35:16, 12.13s/it] + +{'loss': 0.4127, 'learning_rate': 5.6303245429852e-07, 'epoch': 0.9} + + 90%|████████▉ | 6610/7378 [22:39:50<2:35:16, 12.13s/it] + 90%|████████▉ | 6611/7378 [22:40:02<2:37:29, 12.32s/it] + +{'loss': 0.4777, 'learning_rate': 5.615810498049557e-07, 'epoch': 0.9} + + 90%|████████▉ | 6611/7378 [22:40:02<2:37:29, 12.32s/it] + 90%|████████▉ | 6612/7378 [22:40:14<2:35:53, 12.21s/it] + +{'loss': 0.3919, 'learning_rate': 5.601314644187283e-07, 'epoch': 0.9} + + 90%|████████▉ | 6612/7378 [22:40:14<2:35:53, 12.21s/it] + 90%|████████▉ | 6613/7378 [22:40:27<2:36:58, 12.31s/it] + +{'loss': 0.4459, 'learning_rate': 5.586836984192223e-07, 'epoch': 0.9} + + 90%|████████▉ | 6613/7378 [22:40:27<2:36:58, 12.31s/it] + 90%|████████▉ | 6614/7378 [22:40:39<2:35:57, 12.25s/it] + +{'loss': 0.4244, 'learning_rate': 5.572377520854699e-07, 'epoch': 0.9} + + 90%|████████▉ | 6614/7378 [22:40:39<2:35:57, 12.25s/it] + 90%|████████▉ | 6615/7378 [22:40:51<2:36:44, 12.33s/it] + +{'loss': 0.431, 'learning_rate': 5.557936256961571e-07, 'epoch': 0.9} + + 90%|████████▉ | 6615/7378 [22:40:51<2:36:44, 12.33s/it] + 90%|████████▉ | 6616/7378 [22:41:04<2:36:24, 12.32s/it] + +{'loss': 0.4756, 'learning_rate': 5.54351319529618e-07, 'epoch': 0.9} + + 90%|████████▉ | 6616/7378 [22:41:04<2:36:24, 12.32s/it] + 90%|████████▉ | 6617/7378 [22:41:16<2:35:46, 12.28s/it] + +{'loss': 0.4673, 'learning_rate': 5.529108338638334e-07, 'epoch': 0.9} + + 90%|████████▉ | 6617/7378 [22:41:16<2:35:46, 12.28s/it] + 90%|████████▉ | 6618/7378 [22:41:28<2:36:04, 12.32s/it] + +{'loss': 0.4216, 'learning_rate': 5.514721689764325e-07, 'epoch': 0.9} + + 90%|████████▉ | 6618/7378 [22:41:28<2:36:04, 12.32s/it] + 90%|████████▉ | 6619/7378 [22:41:41<2:36:46, 12.39s/it] + +{'loss': 0.4287, 'learning_rate': 5.500353251446955e-07, 'epoch': 0.9} + + 90%|████████▉ | 6619/7378 [22:41:41<2:36:46, 12.39s/it] + 90%|████████▉ | 6620/7378 [22:41:54<2:37:23, 12.46s/it] + +{'loss': 0.4128, 'learning_rate': 5.486003026455544e-07, 'epoch': 0.9} + + 90%|████████▉ | 6620/7378 [22:41:54<2:37:23, 12.46s/it] + 90%|████████▉ | 6621/7378 [22:42:06<2:38:03, 12.53s/it] + +{'loss': 0.4687, 'learning_rate': 5.471671017555846e-07, 'epoch': 0.9} + + 90%|████████▉ | 6621/7378 [22:42:06<2:38:03, 12.53s/it] + 90%|████████▉ | 6622/7378 [22:42:19<2:36:51, 12.45s/it] + +{'loss': 0.4759, 'learning_rate': 5.457357227510152e-07, 'epoch': 0.9} + + 90%|████████▉ | 6622/7378 [22:42:19<2:36:51, 12.45s/it] + 90%|████████▉ | 6623/7378 [22:42:31<2:35:16, 12.34s/it] + +{'loss': 0.4131, 'learning_rate': 5.443061659077198e-07, 'epoch': 0.9} + + 90%|████████▉ | 6623/7378 [22:42:31<2:35:16, 12.34s/it] + 90%|████████▉ | 6624/7378 [22:42:43<2:33:27, 12.21s/it] + +{'loss': 0.382, 'learning_rate': 5.428784315012236e-07, 'epoch': 0.9} + + 90%|████████▉ | 6624/7378 [22:42:43<2:33:27, 12.21s/it] + 90%|████████▉ | 6625/7378 [22:42:55<2:35:45, 12.41s/it] + +{'loss': 0.4161, 'learning_rate': 5.414525198067011e-07, 'epoch': 0.9} + + 90%|████████▉ | 6625/7378 [22:42:55<2:35:45, 12.41s/it] + 90%|████████▉ | 6626/7378 [22:43:08<2:35:00, 12.37s/it] + +{'loss': 0.4601, 'learning_rate': 5.400284310989746e-07, 'epoch': 0.9} + + 90%|████████▉ | 6626/7378 [22:43:08<2:35:00, 12.37s/it] + 90%|████████▉ | 6627/7378 [22:43:20<2:34:16, 12.33s/it] + +{'loss': 0.4646, 'learning_rate': 5.386061656525143e-07, 'epoch': 0.9} + + 90%|████████▉ | 6627/7378 [22:43:20<2:34:16, 12.33s/it] + 90%|████████▉ | 6628/7378 [22:43:32<2:33:06, 12.25s/it] + +{'loss': 0.3663, 'learning_rate': 5.371857237414379e-07, 'epoch': 0.9} + + 90%|████████▉ | 6628/7378 [22:43:32<2:33:06, 12.25s/it] + 90%|████████▉ | 6629/7378 [22:43:45<2:34:57, 12.41s/it] + +{'loss': 0.4193, 'learning_rate': 5.357671056395164e-07, 'epoch': 0.9} + + 90%|████████▉ | 6629/7378 [22:43:45<2:34:57, 12.41s/it] + 90%|████████▉ | 6630/7378 [22:43:57<2:33:41, 12.33s/it] + +{'loss': 0.4038, 'learning_rate': 5.343503116201643e-07, 'epoch': 0.9} + + 90%|████████▉ | 6630/7378 [22:43:57<2:33:41, 12.33s/it] + 90%|████████▉ | 6631/7378 [22:44:09<2:34:07, 12.38s/it] + +{'loss': 0.3599, 'learning_rate': 5.329353419564476e-07, 'epoch': 0.9} + + 90%|████████▉ | 6631/7378 [22:44:09<2:34:07, 12.38s/it] + 90%|████████▉ | 6632/7378 [22:44:21<2:32:47, 12.29s/it] + +{'loss': 0.4894, 'learning_rate': 5.315221969210782e-07, 'epoch': 0.9} + + 90%|████████▉ | 6632/7378 [22:44:21<2:32:47, 12.29s/it] + 90%|████████▉ | 6633/7378 [22:44:33<2:31:36, 12.21s/it] + +{'loss': 0.4203, 'learning_rate': 5.301108767864171e-07, 'epoch': 0.9} + + 90%|████████▉ | 6633/7378 [22:44:33<2:31:36, 12.21s/it] + 90%|████████▉ | 6634/7378 [22:44:46<2:31:07, 12.19s/it] + +{'loss': 0.409, 'learning_rate': 5.287013818244768e-07, 'epoch': 0.9} + + 90%|████████▉ | 6634/7378 [22:44:46<2:31:07, 12.19s/it] + 90%|████████▉ | 6635/7378 [22:44:57<2:29:22, 12.06s/it] + +{'loss': 0.407, 'learning_rate': 5.272937123069133e-07, 'epoch': 0.9} + + 90%|████████▉ | 6635/7378 [22:44:57<2:29:22, 12.06s/it] + 90%|████████▉ | 6636/7378 [22:45:10<2:29:34, 12.10s/it] + +{'loss': 0.3905, 'learning_rate': 5.258878685050339e-07, 'epoch': 0.9} + + 90%|████████▉ | 6636/7378 [22:45:10<2:29:34, 12.10s/it] + 90%|████████▉ | 6637/7378 [22:45:22<2:29:46, 12.13s/it] + +{'loss': 0.4194, 'learning_rate': 5.244838506897909e-07, 'epoch': 0.9} + + 90%|████████▉ | 6637/7378 [22:45:22<2:29:46, 12.13s/it] + 90%|████████▉ | 6638/7378 [22:45:34<2:31:34, 12.29s/it] + +{'loss': 0.3984, 'learning_rate': 5.230816591317899e-07, 'epoch': 0.9} + + 90%|████████▉ | 6638/7378 [22:45:34<2:31:34, 12.29s/it] + 90%|████████▉ | 6639/7378 [22:45:47<2:32:23, 12.37s/it] + +{'loss': 0.4711, 'learning_rate': 5.216812941012794e-07, 'epoch': 0.9} + + 90%|████████▉ | 6639/7378 [22:45:47<2:32:23, 12.37s/it] + 90%|████████▉ | 6640/7378 [22:45:59<2:31:44, 12.34s/it] + +{'loss': 0.4203, 'learning_rate': 5.202827558681589e-07, 'epoch': 0.9} + + 90%|████████▉ | 6640/7378 [22:45:59<2:31:44, 12.34s/it] + 90%|█████████ | 6641/7378 [22:46:11<2:30:40, 12.27s/it] + +{'loss': 0.4266, 'learning_rate': 5.188860447019728e-07, 'epoch': 0.9} + + 90%|█████████ | 6641/7378 [22:46:11<2:30:40, 12.27s/it] + 90%|█████████ | 6642/7378 [22:46:24<2:31:07, 12.32s/it] + +{'loss': 0.4866, 'learning_rate': 5.174911608719157e-07, 'epoch': 0.9} + + 90%|█████████ | 6642/7378 [22:46:24<2:31:07, 12.32s/it] + 90%|█████████ | 6643/7378 [22:46:36<2:32:05, 12.42s/it] + +{'loss': 0.4569, 'learning_rate': 5.160981046468317e-07, 'epoch': 0.9} + + 90%|█████████ | 6643/7378 [22:46:36<2:32:05, 12.42s/it] + 90%|█████████ | 6644/7378 [22:46:49<2:31:09, 12.36s/it] + +{'loss': 0.4846, 'learning_rate': 5.14706876295209e-07, 'epoch': 0.9} + + 90%|█████████ | 6644/7378 [22:46:49<2:31:09, 12.36s/it] + 90%|█████████ | 6645/7378 [22:47:01<2:30:33, 12.32s/it] + +{'loss': 0.4111, 'learning_rate': 5.133174760851856e-07, 'epoch': 0.9} + + 90%|█████████ | 6645/7378 [22:47:01<2:30:33, 12.32s/it] + 90%|█████████ | 6646/7378 [22:47:13<2:30:20, 12.32s/it] + +{'loss': 0.438, 'learning_rate': 5.119299042845449e-07, 'epoch': 0.9} + + 90%|█████████ | 6646/7378 [22:47:13<2:30:20, 12.32s/it] + 90%|█████████ | 6647/7378 [22:47:25<2:28:23, 12.18s/it] + +{'loss': 0.4418, 'learning_rate': 5.10544161160722e-07, 'epoch': 0.9} + + 90%|█████████ | 6647/7378 [22:47:25<2:28:23, 12.18s/it] + 90%|█████████ | 6648/7378 [22:47:37<2:27:50, 12.15s/it] + +{'loss': 0.4543, 'learning_rate': 5.091602469807965e-07, 'epoch': 0.9} + + 90%|█████████ | 6648/7378 [22:47:37<2:27:50, 12.15s/it] + 90%|█████████ | 6649/7378 [22:47:49<2:27:57, 12.18s/it] + +{'loss': 0.4338, 'learning_rate': 5.077781620114952e-07, 'epoch': 0.9} + + 90%|█████████ | 6649/7378 [22:47:49<2:27:57, 12.18s/it] + 90%|█████████ | 6650/7378 [22:48:01<2:27:20, 12.14s/it] + +{'loss': 0.4134, 'learning_rate': 5.063979065191948e-07, 'epoch': 0.9} + + 90%|█████████ | 6650/7378 [22:48:01<2:27:20, 12.14s/it] + 90%|█████████ | 6651/7378 [22:48:14<2:27:27, 12.17s/it] + +{'loss': 0.4914, 'learning_rate': 5.050194807699149e-07, 'epoch': 0.9} + + 90%|█████████ | 6651/7378 [22:48:14<2:27:27, 12.17s/it] + 90%|█████████ | 6652/7378 [22:48:26<2:28:17, 12.26s/it] + +{'loss': 0.3835, 'learning_rate': 5.036428850293295e-07, 'epoch': 0.9} + + 90%|█████████ | 6652/7378 [22:48:26<2:28:17, 12.26s/it] + 90%|█████████ | 6653/7378 [22:48:39<2:28:51, 12.32s/it] + +{'loss': 0.4994, 'learning_rate': 5.022681195627543e-07, 'epoch': 0.9} + + 90%|█████████ | 6653/7378 [22:48:39<2:28:51, 12.32s/it] + 90%|█████████ | 6654/7378 [22:48:51<2:27:27, 12.22s/it] + +{'loss': 0.4635, 'learning_rate': 5.00895184635154e-07, 'epoch': 0.9} + + 90%|█████████ | 6654/7378 [22:48:51<2:27:27, 12.22s/it] + 90%|█████████ | 6655/7378 [22:49:03<2:29:33, 12.41s/it] + +{'loss': 0.421, 'learning_rate': 4.99524080511139e-07, 'epoch': 0.9} + + 90%|█████████ | 6655/7378 [22:49:03<2:29:33, 12.41s/it] + 90%|█████████ | 6656/7378 [22:49:15<2:26:59, 12.21s/it] + +{'loss': 0.4277, 'learning_rate': 4.981548074549669e-07, 'epoch': 0.9} + + 90%|█████████ | 6656/7378 [22:49:15<2:26:59, 12.21s/it] + 90%|█████████ | 6657/7378 [22:49:28<2:27:22, 12.26s/it] + +{'loss': 0.4855, 'learning_rate': 4.967873657305478e-07, 'epoch': 0.9} + + 90%|█████████ | 6657/7378 [22:49:28<2:27:22, 12.26s/it] + 90%|█████████ | 6658/7378 [22:49:39<2:25:43, 12.14s/it] + +{'loss': 0.4118, 'learning_rate': 4.954217556014318e-07, 'epoch': 0.9} + + 90%|█████████ | 6658/7378 [22:49:39<2:25:43, 12.14s/it] + 90%|█████████ | 6659/7378 [22:49:52<2:25:17, 12.12s/it] + +{'loss': 0.4228, 'learning_rate': 4.940579773308196e-07, 'epoch': 0.9} + + 90%|█████████ | 6659/7378 [22:49:52<2:25:17, 12.12s/it] + 90%|█████████ | 6660/7378 [22:50:04<2:24:50, 12.10s/it] + +{'loss': 0.4207, 'learning_rate': 4.926960311815587e-07, 'epoch': 0.9} + + 90%|█████████ | 6660/7378 [22:50:04<2:24:50, 12.10s/it] + 90%|█████████ | 6661/7378 [22:50:16<2:26:06, 12.23s/it] + +{'loss': 0.4658, 'learning_rate': 4.913359174161403e-07, 'epoch': 0.9} + + 90%|█████████ | 6661/7378 [22:50:16<2:26:06, 12.23s/it] + 90%|█████████ | 6662/7378 [22:50:28<2:25:36, 12.20s/it] + +{'loss': 0.4175, 'learning_rate': 4.89977636296709e-07, 'epoch': 0.9} + + 90%|█████████ | 6662/7378 [22:50:28<2:25:36, 12.20s/it] + 90%|█████████ | 6663/7378 [22:50:41<2:26:23, 12.29s/it] + +{'loss': 0.4433, 'learning_rate': 4.8862118808505e-07, 'epoch': 0.9} + + 90%|█████████ | 6663/7378 [22:50:41<2:26:23, 12.29s/it] + 90%|█████████ | 6664/7378 [22:50:53<2:26:41, 12.33s/it] + +{'loss': 0.4424, 'learning_rate': 4.872665730425973e-07, 'epoch': 0.9} + + 90%|█████████ | 6664/7378 [22:50:53<2:26:41, 12.33s/it] + 90%|█████████ | 6665/7378 [22:51:06<2:27:14, 12.39s/it] + +{'loss': 0.4296, 'learning_rate': 4.859137914304313e-07, 'epoch': 0.9} + + 90%|█████████ | 6665/7378 [22:51:06<2:27:14, 12.39s/it] + 90%|█████████ | 6666/7378 [22:51:18<2:26:39, 12.36s/it] + +{'loss': 0.4115, 'learning_rate': 4.845628435092797e-07, 'epoch': 0.9} + + 90%|█████████ | 6666/7378 [22:51:18<2:26:39, 12.36s/it] + 90%|█████████ | 6667/7378 [22:51:30<2:25:41, 12.29s/it] + +{'loss': 0.3972, 'learning_rate': 4.832137295395189e-07, 'epoch': 0.9} + + 90%|█████████ | 6667/7378 [22:51:30<2:25:41, 12.29s/it] + 90%|█████████ | 6668/7378 [22:51:42<2:24:56, 12.25s/it] + +{'loss': 0.4407, 'learning_rate': 4.818664497811664e-07, 'epoch': 0.9} + + 90%|█████████ | 6668/7378 [22:51:42<2:24:56, 12.25s/it] + 90%|█████████ | 6669/7378 [22:51:55<2:25:09, 12.28s/it] + +{'loss': 0.4308, 'learning_rate': 4.805210044938913e-07, 'epoch': 0.9} + + 90%|█████████ | 6669/7378 [22:51:55<2:25:09, 12.28s/it] + 90%|█████████ | 6670/7378 [22:52:07<2:25:25, 12.32s/it] + +{'loss': 0.347, 'learning_rate': 4.791773939370048e-07, 'epoch': 0.9} + + 90%|█████████ | 6670/7378 [22:52:07<2:25:25, 12.32s/it] + 90%|█████████ | 6671/7378 [22:52:19<2:24:13, 12.24s/it] + +{'loss': 0.4084, 'learning_rate': 4.778356183694688e-07, 'epoch': 0.9} + + 90%|█████████ | 6671/7378 [22:52:19<2:24:13, 12.24s/it] + 90%|█████████ | 6672/7378 [22:52:31<2:24:06, 12.25s/it] + +{'loss': 0.4716, 'learning_rate': 4.764956780498897e-07, 'epoch': 0.9} + + 90%|█████████ | 6672/7378 [22:52:31<2:24:06, 12.25s/it] + 90%|█████████ | 6673/7378 [22:52:44<2:23:38, 12.22s/it] + +{'loss': 0.3951, 'learning_rate': 4.7515757323651877e-07, 'epoch': 0.9} + + 90%|█████████ | 6673/7378 [22:52:44<2:23:38, 12.22s/it] + 90%|█████████ | 6674/7378 [22:52:56<2:23:04, 12.19s/it] + +{'loss': 0.4802, 'learning_rate': 4.738213041872552e-07, 'epoch': 0.9} + + 90%|█████████ | 6674/7378 [22:52:56<2:23:04, 12.19s/it] + 90%|█████████ | 6675/7378 [22:53:08<2:24:18, 12.32s/it] + +{'loss': 0.3501, 'learning_rate': 4.72486871159642e-07, 'epoch': 0.9} + + 90%|█████████ | 6675/7378 [22:53:08<2:24:18, 12.32s/it] + 90%|█████████ | 6676/7378 [22:53:21<2:25:41, 12.45s/it] + +{'loss': 0.4475, 'learning_rate': 4.711542744108744e-07, 'epoch': 0.9} + + 90%|█████████ | 6676/7378 [22:53:21<2:25:41, 12.45s/it] + 90%|█████████ | 6677/7378 [22:53:33<2:24:24, 12.36s/it] + +{'loss': 0.4571, 'learning_rate': 4.6982351419778695e-07, 'epoch': 0.9} + + 90%|█████████ | 6677/7378 [22:53:33<2:24:24, 12.36s/it] + 91%|█████████ | 6678/7378 [22:53:45<2:23:50, 12.33s/it] + +{'loss': 0.4398, 'learning_rate': 4.684945907768623e-07, 'epoch': 0.91} + + 91%|█████████ | 6678/7378 [22:53:45<2:23:50, 12.33s/it] + 91%|█████████ | 6679/7378 [22:53:57<2:22:37, 12.24s/it] + +{'loss': 0.4171, 'learning_rate': 4.671675044042301e-07, 'epoch': 0.91} + + 91%|█████████ | 6679/7378 [22:53:57<2:22:37, 12.24s/it] + 91%|█████████ | 6680/7378 [22:54:10<2:22:52, 12.28s/it] + +{'loss': 0.452, 'learning_rate': 4.6584225533566674e-07, 'epoch': 0.91} + + 91%|█████████ | 6680/7378 [22:54:10<2:22:52, 12.28s/it] + 91%|█████████ | 6681/7378 [22:54:22<2:22:29, 12.27s/it] + +{'loss': 0.3605, 'learning_rate': 4.645188438265924e-07, 'epoch': 0.91} + + 91%|█████████ | 6681/7378 [22:54:22<2:22:29, 12.27s/it] + 91%|█████████ | 6682/7378 [22:54:35<2:24:21, 12.45s/it] + +{'loss': 0.4398, 'learning_rate': 4.6319727013207416e-07, 'epoch': 0.91} + + 91%|█████████ | 6682/7378 [22:54:35<2:24:21, 12.45s/it] + 91%|█████████ | 6683/7378 [22:54:47<2:24:23, 12.47s/it] + +{'loss': 0.4308, 'learning_rate': 4.618775345068238e-07, 'epoch': 0.91} + + 91%|█████████ | 6683/7378 [22:54:47<2:24:23, 12.47s/it] + 91%|█████████ | 6684/7378 [22:54:59<2:22:17, 12.30s/it] + +{'loss': 0.4608, 'learning_rate': 4.605596372051979e-07, 'epoch': 0.91} + + 91%|█████████ | 6684/7378 [22:54:59<2:22:17, 12.30s/it] + 91%|█████████ | 6685/7378 [22:55:12<2:23:09, 12.39s/it] + +{'loss': 0.4278, 'learning_rate': 4.592435784812055e-07, 'epoch': 0.91} + + 91%|█████████ | 6685/7378 [22:55:12<2:23:09, 12.39s/it] + 91%|█████████ | 6686/7378 [22:55:24<2:22:48, 12.38s/it] + +{'loss': 0.3985, 'learning_rate': 4.579293585884925e-07, 'epoch': 0.91} + + 91%|█████████ | 6686/7378 [22:55:24<2:22:48, 12.38s/it] + 91%|█████████ | 6687/7378 [22:55:37<2:22:29, 12.37s/it] + +{'loss': 0.4456, 'learning_rate': 4.5661697778035643e-07, 'epoch': 0.91} + + 91%|█████████ | 6687/7378 [22:55:37<2:22:29, 12.37s/it] + 91%|█████████ | 6688/7378 [22:55:49<2:20:58, 12.26s/it] + +{'loss': 0.4973, 'learning_rate': 4.553064363097337e-07, 'epoch': 0.91} + + 91%|█████████ | 6688/7378 [22:55:49<2:20:58, 12.26s/it] + 91%|█████████ | 6689/7378 [22:56:01<2:21:20, 12.31s/it] + +{'loss': 0.3774, 'learning_rate': 4.539977344292168e-07, 'epoch': 0.91} + + 91%|█████████ | 6689/7378 [22:56:01<2:21:20, 12.31s/it] + 91%|█████████ | 6690/7378 [22:56:13<2:21:04, 12.30s/it] + +{'loss': 0.4023, 'learning_rate': 4.526908723910339e-07, 'epoch': 0.91} + + 91%|█████████ | 6690/7378 [22:56:13<2:21:04, 12.30s/it] + 91%|█████████ | 6691/7378 [22:56:26<2:20:21, 12.26s/it] + +{'loss': 0.4118, 'learning_rate': 4.513858504470625e-07, 'epoch': 0.91} + + 91%|█████████ | 6691/7378 [22:56:26<2:20:21, 12.26s/it] + 91%|█████████ | 6692/7378 [22:56:38<2:21:14, 12.35s/it] + +{'loss': 0.4676, 'learning_rate': 4.500826688488269e-07, 'epoch': 0.91} + + 91%|█████████ | 6692/7378 [22:56:38<2:21:14, 12.35s/it] + 91%|█████████ | 6693/7378 [22:56:50<2:20:15, 12.28s/it] + +{'loss': 0.4256, 'learning_rate': 4.4878132784749063e-07, 'epoch': 0.91} + + 91%|█████████ | 6693/7378 [22:56:50<2:20:15, 12.28s/it] + 91%|█████████ | 6694/7378 [22:57:02<2:19:55, 12.27s/it] + +{'loss': 0.4066, 'learning_rate': 4.4748182769387196e-07, 'epoch': 0.91} + + 91%|█████████ | 6694/7378 [22:57:03<2:19:55, 12.27s/it] + 91%|██��██████ | 6695/7378 [22:57:15<2:19:09, 12.22s/it] + +{'loss': 0.317, 'learning_rate': 4.4618416863842606e-07, 'epoch': 0.91} + + 91%|█████████ | 6695/7378 [22:57:15<2:19:09, 12.22s/it] + 91%|█████████ | 6696/7378 [22:57:27<2:18:11, 12.16s/it] + +{'loss': 0.4922, 'learning_rate': 4.4488835093125736e-07, 'epoch': 0.91} + + 91%|█████████ | 6696/7378 [22:57:27<2:18:11, 12.16s/it] + 91%|█████████ | 6697/7378 [22:57:39<2:18:35, 12.21s/it] + +{'loss': 0.428, 'learning_rate': 4.4359437482211276e-07, 'epoch': 0.91} + + 91%|█████████ | 6697/7378 [22:57:39<2:18:35, 12.21s/it] + 91%|█████████ | 6698/7378 [22:57:51<2:18:21, 12.21s/it] + +{'loss': 0.4289, 'learning_rate': 4.423022405603894e-07, 'epoch': 0.91} + + 91%|█████████ | 6698/7378 [22:57:51<2:18:21, 12.21s/it] + 91%|█████████ | 6699/7378 [22:58:03<2:18:00, 12.19s/it] + +{'loss': 0.3958, 'learning_rate': 4.4101194839512364e-07, 'epoch': 0.91} + + 91%|█████████ | 6699/7378 [22:58:03<2:18:00, 12.19s/it] + 91%|█████████ | 6700/7378 [22:58:15<2:17:25, 12.16s/it] + +{'loss': 0.3985, 'learning_rate': 4.3972349857499874e-07, 'epoch': 0.91} + + 91%|█████████ | 6700/7378 [22:58:15<2:17:25, 12.16s/it] + 91%|█████████ | 6701/7378 [22:58:28<2:17:07, 12.15s/it] + +{'loss': 0.4617, 'learning_rate': 4.384368913483439e-07, 'epoch': 0.91} + + 91%|█████████ | 6701/7378 [22:58:28<2:17:07, 12.15s/it] + 91%|█████████ | 6702/7378 [22:58:40<2:16:39, 12.13s/it] + +{'loss': 0.4178, 'learning_rate': 4.371521269631307e-07, 'epoch': 0.91} + + 91%|█████████ | 6702/7378 [22:58:40<2:16:39, 12.13s/it] + 91%|█████████ | 6703/7378 [22:58:52<2:17:45, 12.24s/it] + +{'loss': 0.4348, 'learning_rate': 4.3586920566698e-07, 'epoch': 0.91} + + 91%|█████████ | 6703/7378 [22:58:52<2:17:45, 12.24s/it] + 91%|█████████ | 6704/7378 [22:59:05<2:19:53, 12.45s/it] + +{'loss': 0.4437, 'learning_rate': 4.34588127707154e-07, 'epoch': 0.91} + + 91%|█████████ | 6704/7378 [22:59:05<2:19:53, 12.45s/it] + 91%|█████████ | 6705/7378 [22:59:17<2:18:42, 12.37s/it] + +{'loss': 0.4259, 'learning_rate': 4.333088933305607e-07, 'epoch': 0.91} + + 91%|█████████ | 6705/7378 [22:59:17<2:18:42, 12.37s/it] + 91%|█████████ | 6706/7378 [22:59:29<2:17:48, 12.30s/it] + +{'loss': 0.4896, 'learning_rate': 4.3203150278375184e-07, 'epoch': 0.91} + + 91%|█████████ | 6706/7378 [22:59:29<2:17:48, 12.30s/it] + 91%|█████████ | 6707/7378 [22:59:41<2:16:58, 12.25s/it] + +{'loss': 0.3714, 'learning_rate': 4.307559563129238e-07, 'epoch': 0.91} + + 91%|█████████ | 6707/7378 [22:59:41<2:16:58, 12.25s/it] + 91%|█████████ | 6708/7378 [22:59:53<2:15:44, 12.16s/it] + +{'loss': 0.4136, 'learning_rate': 4.2948225416391986e-07, 'epoch': 0.91} + + 91%|█████████ | 6708/7378 [22:59:53<2:15:44, 12.16s/it] + 91%|█████████ | 6709/7378 [23:00:06<2:16:49, 12.27s/it] + +{'loss': 0.4675, 'learning_rate': 4.2821039658222483e-07, 'epoch': 0.91} + + 91%|█████████ | 6709/7378 [23:00:06<2:16:49, 12.27s/it] + 91%|█████████ | 6710/7378 [23:00:19<2:17:59, 12.39s/it] + +{'loss': 0.4444, 'learning_rate': 4.269403838129704e-07, 'epoch': 0.91} + + 91%|█████████ | 6710/7378 [23:00:19<2:17:59, 12.39s/it] + 91%|█████████ | 6711/7378 [23:00:31<2:17:11, 12.34s/it] + +{'loss': 0.4199, 'learning_rate': 4.2567221610092966e-07, 'epoch': 0.91} + + 91%|█████████ | 6711/7378 [23:00:31<2:17:11, 12.34s/it] + 91%|█████████ | 6712/7378 [23:00:43<2:15:42, 12.23s/it] + +{'loss': 0.4363, 'learning_rate': 4.2440589369052265e-07, 'epoch': 0.91} + + 91%|█████████ | 6712/7378 [23:00:43<2:15:42, 12.23s/it] + 91%|█████████ | 6713/7378 [23:00:56<2:17:11, 12.38s/it] + +{'loss': 0.441, 'learning_rate': 4.231414168258163e-07, 'epoch': 0.91} + + 91%|█████████ | 6713/7378 [23:00:56<2:17:11, 12.38s/it] + 91%|█████████ | 6714/7378 [23:01:08<2:16:22, 12.32s/it] + +{'loss': 0.442, 'learning_rate': 4.2187878575051466e-07, 'epoch': 0.91} + + 91%|█████████ | 6714/7378 [23:01:08<2:16:22, 12.32s/it] + 91%|█████████ | 6715/7378 [23:01:20<2:17:01, 12.40s/it] + +{'loss': 0.3776, 'learning_rate': 4.2061800070797186e-07, 'epoch': 0.91} + + 91%|█████████ | 6715/7378 [23:01:20<2:17:01, 12.40s/it] + 91%|█████████ | 6716/7378 [23:01:33<2:17:33, 12.47s/it] + +{'loss': 0.3793, 'learning_rate': 4.193590619411847e-07, 'epoch': 0.91} + + 91%|█████████ | 6716/7378 [23:01:33<2:17:33, 12.47s/it] + 91%|█████████ | 6717/7378 [23:01:45<2:16:51, 12.42s/it] + +{'loss': 0.5345, 'learning_rate': 4.181019696927924e-07, 'epoch': 0.91} + + 91%|█████████ | 6717/7378 [23:01:45<2:16:51, 12.42s/it] + 91%|█████████ | 6718/7378 [23:01:58<2:16:15, 12.39s/it] + +{'loss': 0.4025, 'learning_rate': 4.168467242050822e-07, 'epoch': 0.91} + + 91%|█████████ | 6718/7378 [23:01:58<2:16:15, 12.39s/it] + 91%|█████████ | 6719/7378 [23:02:10<2:16:22, 12.42s/it] + +{'loss': 0.4252, 'learning_rate': 4.155933257199807e-07, 'epoch': 0.91} + + 91%|█████████ | 6719/7378 [23:02:10<2:16:22, 12.42s/it] + 91%|█████████ | 6720/7378 [23:02:22<2:15:04, 12.32s/it] + +{'loss': 0.4163, 'learning_rate': 4.1434177447906343e-07, 'epoch': 0.91} + + 91%|█████████ | 6720/7378 [23:02:22<2:15:04, 12.32s/it] + 91%|█████████ | 6721/7378 [23:02:35<2:16:29, 12.46s/it] + +{'loss': 0.457, 'learning_rate': 4.1309207072354305e-07, 'epoch': 0.91} + + 91%|█████████ | 6721/7378 [23:02:35<2:16:29, 12.46s/it] + 91%|█████████ | 6722/7378 [23:02:48<2:16:41, 12.50s/it] + +{'loss': 0.421, 'learning_rate': 4.118442146942847e-07, 'epoch': 0.91} + + 91%|█████████ | 6722/7378 [23:02:48<2:16:41, 12.50s/it] + 91%|█████████ | 6723/7378 [23:03:00<2:15:47, 12.44s/it] + +{'loss': 0.4281, 'learning_rate': 4.105982066317904e-07, 'epoch': 0.91} + + 91%|█████████ | 6723/7378 [23:03:00<2:15:47, 12.44s/it] + 91%|█████████ | 6724/7378 [23:03:12<2:14:33, 12.34s/it] + +{'loss': 0.4372, 'learning_rate': 4.0935404677621025e-07, 'epoch': 0.91} + + 91%|█████████ | 6724/7378 [23:03:12<2:14:33, 12.34s/it] + 91%|█████████ | 6725/7378 [23:03:24<2:13:52, 12.30s/it] + +{'loss': 0.4046, 'learning_rate': 4.0811173536733586e-07, 'epoch': 0.91} + + 91%|█████████ | 6725/7378 [23:03:24<2:13:52, 12.30s/it] + 91%|█████████ | 6726/7378 [23:03:36<2:13:35, 12.29s/it] + +{'loss': 0.4144, 'learning_rate': 4.0687127264460224e-07, 'epoch': 0.91} + + 91%|█████████ | 6726/7378 [23:03:36<2:13:35, 12.29s/it] + 91%|█████████ | 6727/7378 [23:03:49<2:14:10, 12.37s/it] + +{'loss': 0.4225, 'learning_rate': 4.0563265884709157e-07, 'epoch': 0.91} + + 91%|█████████ | 6727/7378 [23:03:49<2:14:10, 12.37s/it] + 91%|█████████ | 6728/7378 [23:04:01<2:12:51, 12.26s/it] + +{'loss': 0.4348, 'learning_rate': 4.043958942135262e-07, 'epoch': 0.91} + + 91%|█████████ | 6728/7378 [23:04:01<2:12:51, 12.26s/it] + 91%|█████████ | 6729/7378 [23:04:13<2:11:46, 12.18s/it] + +{'loss': 0.4485, 'learning_rate': 4.0316097898227215e-07, 'epoch': 0.91} + + 91%|█████████ | 6729/7378 [23:04:13<2:11:46, 12.18s/it] + 91%|█████████ | 6730/7378 [23:04:25<2:11:28, 12.17s/it] + +{'loss': 0.3811, 'learning_rate': 4.0192791339133896e-07, 'epoch': 0.91} + + 91%|█████████ | 6730/7378 [23:04:25<2:11:28, 12.17s/it] + 91%|█████████ | 6731/7378 [23:04:38<2:12:12, 12.26s/it] + +{'loss': 0.4424, 'learning_rate': 4.0069669767838436e-07, 'epoch': 0.91} + + 91%|█████████ | 6731/7378 [23:04:38<2:12:12, 12.26s/it] + 91%|█████████ | 6732/7378 [23:04:50<2:11:38, 12.23s/it] + +{'loss': 0.492, 'learning_rate': 3.994673320807041e-07, 'epoch': 0.91} + + 91%|█████████ | 6732/7378 [23:04:50<2:11:38, 12.23s/it] + 91%|█████████▏| 6733/7378 [23:05:02<2:11:45, 12.26s/it] + +{'loss': 0.4335, 'learning_rate': 3.982398168352386e-07, 'epoch': 0.91} + + 91%|█████████▏| 6733/7378 [23:05:02<2:11:45, 12.26s/it] + 91%|█████████▏| 6734/7378 [23:05:14<2:11:19, 12.24s/it] + +{'loss': 0.4662, 'learning_rate': 3.970141521785731e-07, 'epoch': 0.91} + + 91%|█████████▏| 6734/7378 [23:05:14<2:11:19, 12.24s/it] + 91%|█████████▏| 6735/7378 [23:05:26<2:11:00, 12.22s/it] + +{'loss': 0.4231, 'learning_rate': 3.9579033834693303e-07, 'epoch': 0.91} + + 91%|█████████▏| 6735/7378 [23:05:26<2:11:00, 12.22s/it] + 91%|█████████▏| 6736/7378 [23:05:39<2:11:00, 12.24s/it] + +{'loss': 0.4345, 'learning_rate': 3.94568375576192e-07, 'epoch': 0.91} + + 91%|█████████▏| 6736/7378 [23:05:39<2:11:00, 12.24s/it] + 91%|█████████▏| 6737/7378 [23:05:51<2:11:16, 12.29s/it] + +{'loss': 0.4495, 'learning_rate': 3.9334826410186377e-07, 'epoch': 0.91} + + 91%|█████████▏| 6737/7378 [23:05:51<2:11:16, 12.29s/it] + 91%|█████████▏| 6738/7378 [23:06:04<2:11:25, 12.32s/it] + +{'loss': 0.4123, 'learning_rate': 3.9213000415910473e-07, 'epoch': 0.91} + + 91%|█████████▏| 6738/7378 [23:06:04<2:11:25, 12.32s/it] + 91%|█████████▏| 6739/7378 [23:06:16<2:10:38, 12.27s/it] + +{'loss': 0.4357, 'learning_rate': 3.9091359598271483e-07, 'epoch': 0.91} + + 91%|█████████▏| 6739/7378 [23:06:16<2:10:38, 12.27s/it] + 91%|█████████▏| 6740/7378 [23:06:28<2:12:08, 12.43s/it] + +{'loss': 0.4955, 'learning_rate': 3.896990398071399e-07, 'epoch': 0.91} + + 91%|█████████▏| 6740/7378 [23:06:28<2:12:08, 12.43s/it] + 91%|█████████▏| 6741/7378 [23:06:40<2:10:27, 12.29s/it] + +{'loss': 0.3901, 'learning_rate': 3.884863358664648e-07, 'epoch': 0.91} + + 91%|█████████▏| 6741/7378 [23:06:40<2:10:27, 12.29s/it] + 91%|█████████▏| 6742/7378 [23:06:52<2:09:14, 12.19s/it] + +{'loss': 0.4062, 'learning_rate': 3.872754843944204e-07, 'epoch': 0.91} + + 91%|█████████▏| 6742/7378 [23:06:52<2:09:14, 12.19s/it] + 91%|█████████▏| 6743/7378 [23:07:05<2:10:33, 12.34s/it] + +{'loss': 0.4527, 'learning_rate': 3.8606648562437787e-07, 'epoch': 0.91} + + 91%|█████████▏| 6743/7378 [23:07:05<2:10:33, 12.34s/it] + 91%|█████████▏| 6744/7378 [23:07:18<2:11:17, 12.43s/it] + +{'loss': 0.4958, 'learning_rate': 3.8485933978935297e-07, 'epoch': 0.91} + + 91%|█████████▏| 6744/7378 [23:07:18<2:11:17, 12.43s/it] + 91%|█████████▏| 6745/7378 [23:07:30<2:10:17, 12.35s/it] + +{'loss': 0.4024, 'learning_rate': 3.8365404712200624e-07, 'epoch': 0.91} + + 91%|█████████▏| 6745/7378 [23:07:30<2:10:17, 12.35s/it] + 91%|█████████▏| 6746/7378 [23:07:42<2:10:17, 12.37s/it] + +{'loss': 0.4533, 'learning_rate': 3.824506078546353e-07, 'epoch': 0.91} + + 91%|█████████▏| 6746/7378 [23:07:42<2:10:17, 12.37s/it] + 91%|█████████▏| 6747/7378 [23:07:54<2:09:08, 12.28s/it] + +{'loss': 0.4682, 'learning_rate': 3.8124902221918783e-07, 'epoch': 0.91} + + 91%|█████████▏| 6747/7378 [23:07:54<2:09:08, 12.28s/it] + 91%|█████████▏| 6748/7378 [23:08:07<2:08:49, 12.27s/it] + +{'loss': 0.4008, 'learning_rate': 3.800492904472497e-07, 'epoch': 0.91} + + 91%|█████████▏| 6748/7378 [23:08:07<2:08:49, 12.27s/it] + 91%|█████████▏| 6749/7378 [23:08:19<2:08:38, 12.27s/it] + +{'loss': 0.4747, 'learning_rate': 3.788514127700493e-07, 'epoch': 0.91} + + 91%|█████████▏| 6749/7378 [23:08:19<2:08:38, 12.27s/it] + 91%|█████████▏| 6750/7378 [23:08:31<2:08:15, 12.25s/it] + +{'loss': 0.4779, 'learning_rate': 3.776553894184598e-07, 'epoch': 0.91} + + 91%|█████████▏| 6750/7378 [23:08:31<2:08:15, 12.25s/it] + 92%|█████████▏| 6751/7378 [23:08:43<2:08:02, 12.25s/it] + +{'loss': 0.4899, 'learning_rate': 3.764612206229956e-07, 'epoch': 0.92} + + 92%|█████████▏| 6751/7378 [23:08:43<2:08:02, 12.25s/it] + 92%|█████████▏| 6752/7378 [23:08:56<2:08:12, 12.29s/it] + +{'loss': 0.4284, 'learning_rate': 3.7526890661381375e-07, 'epoch': 0.92} + + 92%|█████████▏| 6752/7378 [23:08:56<2:08:12, 12.29s/it] + 92%|█████████▏| 6753/7378 [23:09:08<2:09:14, 12.41s/it] + +{'loss': 0.461, 'learning_rate': 3.740784476207149e-07, 'epoch': 0.92} + + 92%|█████████▏| 6753/7378 [23:09:08<2:09:14, 12.41s/it] + 92%|█████████▏| 6754/7378 [23:09:21<2:08:43, 12.38s/it] + +{'loss': 0.4303, 'learning_rate': 3.728898438731388e-07, 'epoch': 0.92} + + 92%|█████████▏| 6754/7378 [23:09:21<2:08:43, 12.38s/it] + 92%|█████████▏| 6755/7378 [23:09:33<2:07:58, 12.33s/it] + +{'loss': 0.3673, 'learning_rate': 3.7170309560017327e-07, 'epoch': 0.92} + + 92%|█████████▏| 6755/7378 [23:09:33<2:07:58, 12.33s/it] + 92%|█████████▏| 6756/7378 [23:09:45<2:06:56, 12.24s/it] + +{'loss': 0.4641, 'learning_rate': 3.7051820303054544e-07, 'epoch': 0.92} + + 92%|█████████▏| 6756/7378 [23:09:45<2:06:56, 12.24s/it] + 92%|█████████▏| 6757/7378 [23:09:57<2:07:06, 12.28s/it] + +{'loss': 0.4199, 'learning_rate': 3.6933516639262257e-07, 'epoch': 0.92} + + 92%|█████████▏| 6757/7378 [23:09:57<2:07:06, 12.28s/it] + 92%|█████████▏| 6758/7378 [23:10:10<2:07:52, 12.37s/it] + +{'loss': 0.3735, 'learning_rate': 3.681539859144168e-07, 'epoch': 0.92} + + 92%|█████████▏| 6758/7378 [23:10:10<2:07:52, 12.37s/it] + 92%|█████████▏| 6759/7378 [23:10:22<2:07:09, 12.33s/it] + +{'loss': 0.4272, 'learning_rate': 3.6697466182358366e-07, 'epoch': 0.92} + + 92%|█████████▏| 6759/7378 [23:10:22<2:07:09, 12.33s/it] + 92%|█████████���| 6760/7378 [23:10:34<2:06:40, 12.30s/it] + +{'loss': 0.4205, 'learning_rate': 3.6579719434741924e-07, 'epoch': 0.92} + + 92%|█████████▏| 6760/7378 [23:10:34<2:06:40, 12.30s/it] + 92%|█████████▏| 6761/7378 [23:10:46<2:05:47, 12.23s/it] + +{'loss': 0.4105, 'learning_rate': 3.6462158371286194e-07, 'epoch': 0.92} + + 92%|█████████▏| 6761/7378 [23:10:47<2:05:47, 12.23s/it] + 92%|█████████▏| 6762/7378 [23:10:59<2:06:53, 12.36s/it] + +{'loss': 0.4716, 'learning_rate': 3.6344783014649054e-07, 'epoch': 0.92} + + 92%|█████████▏| 6762/7378 [23:10:59<2:06:53, 12.36s/it] + 92%|█████████▏| 6763/7378 [23:11:12<2:07:30, 12.44s/it] + +{'loss': 0.4468, 'learning_rate': 3.6227593387452743e-07, 'epoch': 0.92} + + 92%|█████████▏| 6763/7378 [23:11:12<2:07:30, 12.44s/it] + 92%|█████████▏| 6764/7378 [23:11:24<2:07:49, 12.49s/it] + +{'loss': 0.4073, 'learning_rate': 3.6110589512284076e-07, 'epoch': 0.92} + + 92%|█████████▏| 6764/7378 [23:11:24<2:07:49, 12.49s/it] + 92%|█████████▏| 6765/7378 [23:11:37<2:08:56, 12.62s/it] + +{'loss': 0.4235, 'learning_rate': 3.599377141169336e-07, 'epoch': 0.92} + + 92%|█████████▏| 6765/7378 [23:11:37<2:08:56, 12.62s/it] + 92%|█████████▏| 6766/7378 [23:11:50<2:08:53, 12.64s/it] + +{'loss': 0.4222, 'learning_rate': 3.587713910819568e-07, 'epoch': 0.92} + + 92%|█████████▏| 6766/7378 [23:11:50<2:08:53, 12.64s/it] + 92%|█████████▏| 6767/7378 [23:12:02<2:07:44, 12.54s/it] + +{'loss': 0.4393, 'learning_rate': 3.5760692624269956e-07, 'epoch': 0.92} + + 92%|█████████▏| 6767/7378 [23:12:02<2:07:44, 12.54s/it] + 92%|█████████▏| 6768/7378 [23:12:15<2:06:55, 12.48s/it] + +{'loss': 0.3717, 'learning_rate': 3.564443198235945e-07, 'epoch': 0.92} + + 92%|█████████▏| 6768/7378 [23:12:15<2:06:55, 12.48s/it] + 92%|█████████▏| 6769/7378 [23:12:27<2:05:49, 12.40s/it] + +{'loss': 0.394, 'learning_rate': 3.5528357204871686e-07, 'epoch': 0.92} + + 92%|█████████▏| 6769/7378 [23:12:27<2:05:49, 12.40s/it] + 92%|█████████▏| 6770/7378 [23:12:39<2:04:34, 12.29s/it] + +{'loss': 0.4641, 'learning_rate': 3.541246831417811e-07, 'epoch': 0.92} + + 92%|█████████▏| 6770/7378 [23:12:39<2:04:34, 12.29s/it] + 92%|█████████▏| 6771/7378 [23:12:51<2:03:15, 12.18s/it] + +{'loss': 0.4182, 'learning_rate': 3.5296765332614615e-07, 'epoch': 0.92} + + 92%|█████████▏| 6771/7378 [23:12:51<2:03:15, 12.18s/it] + 92%|█████████▏| 6772/7378 [23:13:04<2:05:18, 12.41s/it] + +{'loss': 0.5047, 'learning_rate': 3.5181248282480815e-07, 'epoch': 0.92} + + 92%|█████████▏| 6772/7378 [23:13:04<2:05:18, 12.41s/it] + 92%|█████████▏| 6773/7378 [23:13:16<2:04:04, 12.31s/it] + +{'loss': 0.4407, 'learning_rate': 3.506591718604124e-07, 'epoch': 0.92} + + 92%|█████████▏| 6773/7378 [23:13:16<2:04:04, 12.31s/it] + 92%|█████████▏| 6774/7378 [23:13:28<2:03:30, 12.27s/it] + +{'loss': 0.4329, 'learning_rate': 3.4950772065523996e-07, 'epoch': 0.92} + + 92%|█████████▏| 6774/7378 [23:13:28<2:03:30, 12.27s/it] + 92%|█████████▏| 6775/7378 [23:13:40<2:02:34, 12.20s/it] + +{'loss': 0.4554, 'learning_rate': 3.483581294312155e-07, 'epoch': 0.92} + + 92%|█████████▏| 6775/7378 [23:13:40<2:02:34, 12.20s/it] + 92%|█████████▏| 6776/7378 [23:13:52<2:03:01, 12.26s/it] + +{'loss': 0.43, 'learning_rate': 3.472103984099029e-07, 'epoch': 0.92} + + 92%|█████████▏| 6776/7378 [23:13:52<2:03:01, 12.26s/it] + 92%|█████████▏| 6777/7378 [23:14:05<2:02:31, 12.23s/it] + +{'loss': 0.4727, 'learning_rate': 3.4606452781250966e-07, 'epoch': 0.92} + + 92%|█████████▏| 6777/7378 [23:14:05<2:02:31, 12.23s/it] + 92%|█████████▏| 6778/7378 [23:14:17<2:02:15, 12.23s/it] + +{'loss': 0.415, 'learning_rate': 3.449205178598869e-07, 'epoch': 0.92} + + 92%|█████████▏| 6778/7378 [23:14:17<2:02:15, 12.23s/it] + 92%|█████████▏| 6779/7378 [23:14:29<2:02:57, 12.32s/it] + +{'loss': 0.4908, 'learning_rate': 3.4377836877252156e-07, 'epoch': 0.92} + + 92%|█████████▏| 6779/7378 [23:14:29<2:02:57, 12.32s/it] + 92%|█████████▏| 6780/7378 [23:14:42<2:02:54, 12.33s/it] + +{'loss': 0.4461, 'learning_rate': 3.426380807705476e-07, 'epoch': 0.92} + + 92%|█████████▏| 6780/7378 [23:14:42<2:02:54, 12.33s/it] + 92%|█████████▏| 6781/7378 [23:14:55<2:04:36, 12.52s/it] + +{'loss': 0.4489, 'learning_rate': 3.4149965407373474e-07, 'epoch': 0.92} + + 92%|█████████▏| 6781/7378 [23:14:55<2:04:36, 12.52s/it] + 92%|█████████▏| 6782/7378 [23:15:07<2:04:00, 12.48s/it] + +{'loss': 0.4044, 'learning_rate': 3.403630889014986e-07, 'epoch': 0.92} + + 92%|█████████▏| 6782/7378 [23:15:07<2:04:00, 12.48s/it] + 92%|█████████▏| 6783/7378 [23:15:19<2:03:11, 12.42s/it] + +{'loss': 0.4068, 'learning_rate': 3.3922838547289507e-07, 'epoch': 0.92} + + 92%|█████████▏| 6783/7378 [23:15:19<2:03:11, 12.42s/it] + 92%|█████████▏| 6784/7378 [23:15:32<2:03:28, 12.47s/it] + +{'loss': 0.4307, 'learning_rate': 3.380955440066203e-07, 'epoch': 0.92} + + 92%|█████████▏| 6784/7378 [23:15:32<2:03:28, 12.47s/it] + 92%|█████████▏| 6785/7378 [23:15:44<2:01:53, 12.33s/it] + +{'loss': 0.4621, 'learning_rate': 3.369645647210096e-07, 'epoch': 0.92} + + 92%|█████████▏| 6785/7378 [23:15:44<2:01:53, 12.33s/it] + 92%|█████████▏| 6786/7378 [23:15:56<2:00:51, 12.25s/it] + +{'loss': 0.4698, 'learning_rate': 3.358354478340431e-07, 'epoch': 0.92} + + 92%|█████████▏| 6786/7378 [23:15:56<2:00:51, 12.25s/it] + 92%|█████████▏| 6787/7378 [23:16:09<2:01:45, 12.36s/it] + +{'loss': 0.415, 'learning_rate': 3.3470819356334003e-07, 'epoch': 0.92} + + 92%|█████████▏| 6787/7378 [23:16:09<2:01:45, 12.36s/it] + 92%|█████████▏| 6788/7378 [23:16:21<2:01:22, 12.34s/it] + +{'loss': 0.4526, 'learning_rate': 3.335828021261622e-07, 'epoch': 0.92} + + 92%|█████████▏| 6788/7378 [23:16:21<2:01:22, 12.34s/it] + 92%|█████████▏| 6789/7378 [23:16:33<1:59:42, 12.19s/it] + +{'loss': 0.4289, 'learning_rate': 3.324592737394083e-07, 'epoch': 0.92} + + 92%|█████████▏| 6789/7378 [23:16:33<1:59:42, 12.19s/it] + 92%|█████████▏| 6790/7378 [23:16:45<1:59:18, 12.17s/it] + +{'loss': 0.4571, 'learning_rate': 3.3133760861962404e-07, 'epoch': 0.92} + + 92%|█████████▏| 6790/7378 [23:16:45<1:59:18, 12.17s/it] + 92%|█████████▏| 6791/7378 [23:16:57<2:00:03, 12.27s/it] + +{'loss': 0.4767, 'learning_rate': 3.302178069829909e-07, 'epoch': 0.92} + + 92%|█████████▏| 6791/7378 [23:16:57<2:00:03, 12.27s/it] + 92%|█████████▏| 6792/7378 [23:17:10<2:01:01, 12.39s/it] + +{'loss': 0.3667, 'learning_rate': 3.29099869045334e-07, 'epoch': 0.92} + + 92%|█████████▏| 6792/7378 [23:17:10<2:01:01, 12.39s/it] + 92%|█████████▏| 6793/7378 [23:17:22<1:59:58, 12.30s/it] + +{'loss': 0.4415, 'learning_rate': 3.279837950221176e-07, 'epoch': 0.92} + + 92%|█████████▏| 6793/7378 [23:17:22<1:59:58, 12.30s/it] + 92%|█████████▏| 6794/7378 [23:17:35<2:01:43, 12.51s/it] + +{'loss': 0.4536, 'learning_rate': 3.2686958512844867e-07, 'epoch': 0.92} + + 92%|█████████▏| 6794/7378 [23:17:35<2:01:43, 12.51s/it] + 92%|█████████▏| 6795/7378 [23:17:47<2:00:11, 12.37s/it] + +{'loss': 0.4324, 'learning_rate': 3.25757239579072e-07, 'epoch': 0.92} + + 92%|█████████▏| 6795/7378 [23:17:47<2:00:11, 12.37s/it] + 92%|█████████▏| 6796/7378 [23:18:00<2:00:51, 12.46s/it] + +{'loss': 0.5191, 'learning_rate': 3.24646758588375e-07, 'epoch': 0.92} + + 92%|█████████▏| 6796/7378 [23:18:00<2:00:51, 12.46s/it] + 92%|█████████▏| 6797/7378 [23:18:13<2:01:25, 12.54s/it] + +{'loss': 0.4723, 'learning_rate': 3.235381423703865e-07, 'epoch': 0.92} + + 92%|█████████▏| 6797/7378 [23:18:13<2:01:25, 12.54s/it] + 92%|█████████▏| 6798/7378 [23:18:25<2:00:21, 12.45s/it] + +{'loss': 0.441, 'learning_rate': 3.224313911387755e-07, 'epoch': 0.92} + + 92%|█████████▏| 6798/7378 [23:18:25<2:00:21, 12.45s/it] + 92%|█████████▏| 6799/7378 [23:18:37<2:00:02, 12.44s/it] + +{'loss': 0.4022, 'learning_rate': 3.2132650510684924e-07, 'epoch': 0.92} + + 92%|█████████▏| 6799/7378 [23:18:37<2:00:02, 12.44s/it] + 92%|█████████▏| 6800/7378 [23:18:49<1:59:02, 12.36s/it] + +{'loss': 0.3916, 'learning_rate': 3.202234844875574e-07, 'epoch': 0.92} + + 92%|█████████▏| 6800/7378 [23:18:49<1:59:02, 12.36s/it] + 92%|█████████▏| 6801/7378 [23:19:02<1:58:42, 12.34s/it] + +{'loss': 0.4558, 'learning_rate': 3.19122329493492e-07, 'epoch': 0.92} + + 92%|█████████▏| 6801/7378 [23:19:02<1:58:42, 12.34s/it] + 92%|█████████▏| 6802/7378 [23:19:14<1:59:35, 12.46s/it] + +{'loss': 0.4857, 'learning_rate': 3.1802304033688004e-07, 'epoch': 0.92} + + 92%|█████████▏| 6802/7378 [23:19:14<1:59:35, 12.46s/it] + 92%|████████���▏| 6803/7378 [23:19:26<1:58:01, 12.32s/it] + +{'loss': 0.4268, 'learning_rate': 3.169256172295954e-07, 'epoch': 0.92} + + 92%|█████████▏| 6803/7378 [23:19:27<1:58:01, 12.32s/it] + 92%|█████████▏| 6804/7378 [23:19:39<1:58:31, 12.39s/it] + +{'loss': 0.4324, 'learning_rate': 3.1583006038314767e-07, 'epoch': 0.92} + + 92%|█████████▏| 6804/7378 [23:19:39<1:58:31, 12.39s/it] + 92%|█████████▏| 6805/7378 [23:19:51<1:56:58, 12.25s/it] + +{'loss': 0.4391, 'learning_rate': 3.1473637000868694e-07, 'epoch': 0.92} + + 92%|█████████▏| 6805/7378 [23:19:51<1:56:58, 12.25s/it] + 92%|█████████▏| 6806/7378 [23:20:03<1:56:31, 12.22s/it] + +{'loss': 0.404, 'learning_rate': 3.136445463170079e-07, 'epoch': 0.92} + + 92%|█████████▏| 6806/7378 [23:20:03<1:56:31, 12.22s/it] + 92%|█████████▏| 6807/7378 [23:20:15<1:56:43, 12.26s/it] + +{'loss': 0.4066, 'learning_rate': 3.1255458951854113e-07, 'epoch': 0.92} + + 92%|█████████▏| 6807/7378 [23:20:15<1:56:43, 12.26s/it] + 92%|█████████▏| 6808/7378 [23:20:28<1:57:25, 12.36s/it] + +{'loss': 0.4041, 'learning_rate': 3.114664998233585e-07, 'epoch': 0.92} + + 92%|█████████▏| 6808/7378 [23:20:28<1:57:25, 12.36s/it] + 92%|█████████▏| 6809/7378 [23:20:41<1:57:41, 12.41s/it] + +{'loss': 0.4115, 'learning_rate': 3.103802774411702e-07, 'epoch': 0.92} + + 92%|█████████▏| 6809/7378 [23:20:41<1:57:41, 12.41s/it] + 92%|█████████▏| 6810/7378 [23:20:53<1:57:38, 12.43s/it] + +{'loss': 0.4433, 'learning_rate': 3.0929592258133303e-07, 'epoch': 0.92} + + 92%|█████████▏| 6810/7378 [23:20:53<1:57:38, 12.43s/it] + 92%|█████████▏| 6811/7378 [23:21:05<1:56:03, 12.28s/it] + +{'loss': 0.4587, 'learning_rate': 3.0821343545283657e-07, 'epoch': 0.92} + + 92%|█████████▏| 6811/7378 [23:21:05<1:56:03, 12.28s/it] + 92%|█████████▏| 6812/7378 [23:21:17<1:54:55, 12.18s/it] + +{'loss': 0.423, 'learning_rate': 3.071328162643139e-07, 'epoch': 0.92} + + 92%|█████████▏| 6812/7378 [23:21:17<1:54:55, 12.18s/it] + 92%|█████████▏| 6813/7378 [23:21:29<1:55:12, 12.23s/it] + +{'loss': 0.4363, 'learning_rate': 3.0605406522403624e-07, 'epoch': 0.92} + + 92%|█████████▏| 6813/7378 [23:21:29<1:55:12, 12.23s/it] + 92%|█████████▏| 6814/7378 [23:21:42<1:55:15, 12.26s/it] + +{'loss': 0.3695, 'learning_rate': 3.0497718253991724e-07, 'epoch': 0.92} + + 92%|█████████▏| 6814/7378 [23:21:42<1:55:15, 12.26s/it] + 92%|█████████▏| 6815/7378 [23:21:54<1:55:03, 12.26s/it] + +{'loss': 0.427, 'learning_rate': 3.0390216841950873e-07, 'epoch': 0.92} + + 92%|█████████▏| 6815/7378 [23:21:54<1:55:03, 12.26s/it] + 92%|█████████▏| 6816/7378 [23:22:06<1:53:42, 12.14s/it] + +{'loss': 0.3997, 'learning_rate': 3.0282902307000375e-07, 'epoch': 0.92} + + 92%|█████████▏| 6816/7378 [23:22:06<1:53:42, 12.14s/it] + 92%|█████████▏| 6817/7378 [23:22:18<1:53:32, 12.14s/it] + +{'loss': 0.4214, 'learning_rate': 3.0175774669823356e-07, 'epoch': 0.92} + + 92%|█████████▏| 6817/7378 [23:22:18<1:53:32, 12.14s/it] + 92%|█████████▏| 6818/7378 [23:22:31<1:54:52, 12.31s/it] + +{'loss': 0.4347, 'learning_rate': 3.0068833951066747e-07, 'epoch': 0.92} + + 92%|█████████▏| 6818/7378 [23:22:31<1:54:52, 12.31s/it] + 92%|█████████▏| 6819/7378 [23:22:43<1:54:24, 12.28s/it] + +{'loss': 0.4238, 'learning_rate': 2.996208017134217e-07, 'epoch': 0.92} + + 92%|█████████▏| 6819/7378 [23:22:43<1:54:24, 12.28s/it] + 92%|█████████▏| 6820/7378 [23:22:55<1:54:38, 12.33s/it] + +{'loss': 0.4792, 'learning_rate': 2.9855513351224494e-07, 'epoch': 0.92} + + 92%|█████████▏| 6820/7378 [23:22:55<1:54:38, 12.33s/it] + 92%|█████████▏| 6821/7378 [23:23:08<1:54:32, 12.34s/it] + +{'loss': 0.4901, 'learning_rate': 2.974913351125275e-07, 'epoch': 0.92} + + 92%|█████████▏| 6821/7378 [23:23:08<1:54:32, 12.34s/it] + 92%|█████████▏| 6822/7378 [23:23:20<1:53:52, 12.29s/it] + +{'loss': 0.4747, 'learning_rate': 2.964294067193008e-07, 'epoch': 0.92} + + 92%|█████████▏| 6822/7378 [23:23:20<1:53:52, 12.29s/it] + 92%|█████████▏| 6823/7378 [23:23:32<1:53:46, 12.30s/it] + +{'loss': 0.4624, 'learning_rate': 2.953693485372333e-07, 'epoch': 0.92} + + 92%|█████████▏| 6823/7378 [23:23:32<1:53:46, 12.30s/it] + 92%|█████████▏| 6824/7378 [23:23:44<1:52:52, 12.23s/it] + +{'loss': 0.4516, 'learning_rate': 2.9431116077063726e-07, 'epoch': 0.92} + + 92%|█████████▏| 6824/7378 [23:23:44<1:52:52, 12.23s/it] + 93%|█████████▎| 6825/7378 [23:23:57<1:53:30, 12.32s/it] + +{'loss': 0.4205, 'learning_rate': 2.9325484362345945e-07, 'epoch': 0.93} + + 93%|█████████▎| 6825/7378 [23:23:57<1:53:30, 12.32s/it] + 93%|█████████▎| 6826/7378 [23:24:09<1:54:06, 12.40s/it] + +{'loss': 0.4587, 'learning_rate': 2.922003972992904e-07, 'epoch': 0.93} + + 93%|█████████▎| 6826/7378 [23:24:09<1:54:06, 12.40s/it] + 93%|█████████▎| 6827/7378 [23:24:21<1:52:39, 12.27s/it] + +{'loss': 0.3615, 'learning_rate': 2.9114782200135525e-07, 'epoch': 0.93} + + 93%|█████████▎| 6827/7378 [23:24:21<1:52:39, 12.27s/it] + 93%|█████████▎| 6828/7378 [23:24:33<1:52:25, 12.26s/it] + +{'loss': 0.4222, 'learning_rate': 2.9009711793252516e-07, 'epoch': 0.93} + + 93%|█████████▎| 6828/7378 [23:24:33<1:52:25, 12.26s/it] + 93%|█████████▎| 6829/7378 [23:24:45<1:51:16, 12.16s/it] + +{'loss': 0.4312, 'learning_rate': 2.8904828529530473e-07, 'epoch': 0.93} + + 93%|█████████▎| 6829/7378 [23:24:45<1:51:16, 12.16s/it] + 93%|█████████▎| 6830/7378 [23:24:57<1:50:28, 12.10s/it] + +{'loss': 0.4379, 'learning_rate': 2.8800132429184004e-07, 'epoch': 0.93} + + 93%|█████████▎| 6830/7378 [23:24:57<1:50:28, 12.10s/it] + 93%|█████████▎| 6831/7378 [23:25:09<1:50:14, 12.09s/it] + +{'loss': 0.4621, 'learning_rate': 2.8695623512391634e-07, 'epoch': 0.93} + + 93%|█████████▎| 6831/7378 [23:25:09<1:50:14, 12.09s/it] + 93%|█████████▎| 6832/7378 [23:25:22<1:50:10, 12.11s/it] + +{'loss': 0.3847, 'learning_rate': 2.859130179929581e-07, 'epoch': 0.93} + + 93%|█████████▎| 6832/7378 [23:25:22<1:50:10, 12.11s/it] + 93%|█████████▎| 6833/7378 [23:25:34<1:50:15, 12.14s/it] + +{'loss': 0.4505, 'learning_rate': 2.8487167310002894e-07, 'epoch': 0.93} + + 93%|█████████▎| 6833/7378 [23:25:34<1:50:15, 12.14s/it] + 93%|█████████▎| 6834/7378 [23:25:46<1:49:44, 12.10s/it] + +{'loss': 0.3722, 'learning_rate': 2.838322006458327e-07, 'epoch': 0.93} + + 93%|█████████▎| 6834/7378 [23:25:46<1:49:44, 12.10s/it] + 93%|█████████▎| 6835/7378 [23:25:58<1:50:42, 12.23s/it] + +{'loss': 0.5103, 'learning_rate': 2.8279460083071255e-07, 'epoch': 0.93} + + 93%|█████████▎| 6835/7378 [23:25:58<1:50:42, 12.23s/it] + 93%|█████████▎| 6836/7378 [23:26:11<1:52:08, 12.41s/it] + +{'loss': 0.4286, 'learning_rate': 2.817588738546473e-07, 'epoch': 0.93} + + 93%|█████████▎| 6836/7378 [23:26:11<1:52:08, 12.41s/it] + 93%|█████████▎| 6837/7378 [23:26:23<1:50:55, 12.30s/it] + +{'loss': 0.4325, 'learning_rate': 2.807250199172573e-07, 'epoch': 0.93} + + 93%|█████████▎| 6837/7378 [23:26:23<1:50:55, 12.30s/it] + 93%|█████████▎| 6838/7378 [23:26:35<1:50:23, 12.27s/it] + +{'loss': 0.4112, 'learning_rate': 2.79693039217801e-07, 'epoch': 0.93} + + 93%|█████████▎| 6838/7378 [23:26:35<1:50:23, 12.27s/it] + 93%|█████████▎| 6839/7378 [23:26:48<1:50:26, 12.29s/it] + +{'loss': 0.3836, 'learning_rate': 2.7866293195517923e-07, 'epoch': 0.93} + + 93%|█████████▎| 6839/7378 [23:26:48<1:50:26, 12.29s/it] + 93%|█████████▎| 6840/7378 [23:27:00<1:50:15, 12.30s/it] + +{'loss': 0.4209, 'learning_rate': 2.7763469832792767e-07, 'epoch': 0.93} + + 93%|█████████▎| 6840/7378 [23:27:00<1:50:15, 12.30s/it] + 93%|█████████▎| 6841/7378 [23:27:12<1:49:28, 12.23s/it] + +{'loss': 0.4552, 'learning_rate': 2.766083385342222e-07, 'epoch': 0.93} + + 93%|█████████▎| 6841/7378 [23:27:12<1:49:28, 12.23s/it] + 93%|█████████▎| 6842/7378 [23:27:24<1:48:40, 12.16s/it] + +{'loss': 0.4436, 'learning_rate': 2.755838527718757e-07, 'epoch': 0.93} + + 93%|█████████▎| 6842/7378 [23:27:24<1:48:40, 12.16s/it] + 93%|█████████▎| 6843/7378 [23:27:36<1:48:39, 12.19s/it] + +{'loss': 0.4775, 'learning_rate': 2.745612412383447e-07, 'epoch': 0.93} + + 93%|█████████▎| 6843/7378 [23:27:36<1:48:39, 12.19s/it] + 93%|█████████▎| 6844/7378 [23:27:49<1:48:54, 12.24s/it] + +{'loss': 0.4163, 'learning_rate': 2.735405041307215e-07, 'epoch': 0.93} + + 93%|█████████▎| 6844/7378 [23:27:49<1:48:54, 12.24s/it] + 93%|█████████▎| 6845/7378 [23:28:01<1:49:28, 12.32s/it] + +{'loss': 0.3937, 'learning_rate': 2.725216416457344e-07, 'epoch': 0.93} + + 93%|█████████▎| 6845/7378 [23:28:01<1:49:28, 12.32s/it] + 93%|█████████▎| 6846/7378 [23:28:13<1:48:40, 12.26s/it] + +{'loss': 0.4494, 'learning_rate': 2.7150465397975613e-07, 'epoch': 0.93} + + 93%|█████████▎| 6846/7378 [23:28:13<1:48:40, 12.26s/it] + 93%|█████████▎| 6847/7378 [23:28:26<1:48:47, 12.29s/it] + +{'loss': 0.4061, 'learning_rate': 2.7048954132879115e-07, 'epoch': 0.93} + + 93%|█████████▎| 6847/7378 [23:28:26<1:48:47, 12.29s/it] + 93%|█████████▎| 6848/7378 [23:28:38<1:49:03, 12.35s/it] + +{'loss': 0.4295, 'learning_rate': 2.6947630388849175e-07, 'epoch': 0.93} + + 93%|█████████▎| 6848/7378 [23:28:38<1:49:03, 12.35s/it] + 93%|█████████▎| 6849/7378 [23:28:51<1:49:45, 12.45s/it] + +{'loss': 0.4159, 'learning_rate': 2.6846494185414076e-07, 'epoch': 0.93} + + 93%|█████████▎| 6849/7378 [23:28:51<1:49:45, 12.45s/it] + 93%|█████████▎| 6850/7378 [23:29:03<1:48:40, 12.35s/it] + +{'loss': 0.3907, 'learning_rate': 2.674554554206621e-07, 'epoch': 0.93} + + 93%|█████████▎| 6850/7378 [23:29:03<1:48:40, 12.35s/it] + 93%|█████████▎| 6851/7378 [23:29:15<1:47:58, 12.29s/it] + +{'loss': 0.502, 'learning_rate': 2.6644784478261797e-07, 'epoch': 0.93} + + 93%|█████████▎| 6851/7378 [23:29:15<1:47:58, 12.29s/it] + 93%|█████████▎| 6852/7378 [23:29:28<1:48:00, 12.32s/it] + +{'loss': 0.3399, 'learning_rate': 2.6544211013421084e-07, 'epoch': 0.93} + + 93%|█████████▎| 6852/7378 [23:29:28<1:48:00, 12.32s/it] + 93%|█████████▎| 6853/7378 [23:29:40<1:47:18, 12.26s/it] + +{'loss': 0.4101, 'learning_rate': 2.644382516692812e-07, 'epoch': 0.93} + + 93%|█████████▎| 6853/7378 [23:29:40<1:47:18, 12.26s/it] + 93%|█████████▎| 6854/7378 [23:29:52<1:47:55, 12.36s/it] + +{'loss': 0.3416, 'learning_rate': 2.634362695813053e-07, 'epoch': 0.93} + + 93%|█████████▎| 6854/7378 [23:29:52<1:47:55, 12.36s/it] + 93%|█████████▎| 6855/7378 [23:30:05<1:47:30, 12.33s/it] + +{'loss': 0.4337, 'learning_rate': 2.624361640633999e-07, 'epoch': 0.93} + + 93%|█████████▎| 6855/7378 [23:30:05<1:47:30, 12.33s/it] + 93%|█████████▎| 6856/7378 [23:30:17<1:46:57, 12.29s/it] + +{'loss': 0.4365, 'learning_rate': 2.6143793530831853e-07, 'epoch': 0.93} + + 93%|█████████▎| 6856/7378 [23:30:17<1:46:57, 12.29s/it] + 93%|█████████▎| 6857/7378 [23:30:29<1:45:46, 12.18s/it] + +{'loss': 0.4603, 'learning_rate': 2.604415835084562e-07, 'epoch': 0.93} + + 93%|█████████▎| 6857/7378 [23:30:29<1:45:46, 12.18s/it] + 93%|█████████▎| 6858/7378 [23:30:41<1:44:45, 12.09s/it] + +{'loss': 0.4133, 'learning_rate': 2.594471088558437e-07, 'epoch': 0.93} + + 93%|█████████▎| 6858/7378 [23:30:41<1:44:45, 12.09s/it] + 93%|█████████▎| 6859/7378 [23:30:53<1:44:25, 12.07s/it] + +{'loss': 0.3564, 'learning_rate': 2.5845451154214994e-07, 'epoch': 0.93} + + 93%|█████████▎| 6859/7378 [23:30:53<1:44:25, 12.07s/it] + 93%|█████████▎| 6860/7378 [23:31:05<1:46:17, 12.31s/it] + +{'loss': 0.4423, 'learning_rate': 2.57463791758682e-07, 'epoch': 0.93} + + 93%|█████████▎| 6860/7378 [23:31:05<1:46:17, 12.31s/it] + 93%|█████████▎| 6861/7378 [23:31:18<1:46:21, 12.34s/it] + +{'loss': 0.4264, 'learning_rate': 2.56474949696387e-07, 'epoch': 0.93} + + 93%|█████████▎| 6861/7378 [23:31:18<1:46:21, 12.34s/it] + 93%|█████████▎| 6862/7378 [23:31:30<1:45:22, 12.25s/it] + +{'loss': 0.3832, 'learning_rate': 2.5548798554584695e-07, 'epoch': 0.93} + + 93%|█████████▎| 6862/7378 [23:31:30<1:45:22, 12.25s/it] + 93%|█████████▎| 6863/7378 [23:31:42<1:45:16, 12.27s/it] + +{'loss': 0.4425, 'learning_rate': 2.5450289949728536e-07, 'epoch': 0.93} + + 93%|█████████▎| 6863/7378 [23:31:42<1:45:16, 12.27s/it] + 93%|█████████▎| 6864/7378 [23:31:55<1:45:26, 12.31s/it] + +{'loss': 0.3418, 'learning_rate': 2.5351969174056133e-07, 'epoch': 0.93} + + 93%|█████████▎| 6864/7378 [23:31:55<1:45:26, 12.31s/it] + 93%|█████████▎| 6865/7378 [23:32:07<1:44:49, 12.26s/it] + +{'loss': 0.5159, 'learning_rate': 2.525383624651723e-07, 'epoch': 0.93} + + 93%|█████████▎| 6865/7378 [23:32:07<1:44:49, 12.26s/it] + 93%|█████████▎| 6866/7378 [23:32:19<1:43:55, 12.18s/it] + +{'loss': 0.4904, 'learning_rate': 2.515589118602557e-07, 'epoch': 0.93} + + 93%|█████████▎| 6866/7378 [23:32:19<1:43:55, 12.18s/it] + 93%|█████████▎| 6867/7378 [23:32:31<1:44:10, 12.23s/it] + +{'loss': 0.4225, 'learning_rate': 2.50581340114584e-07, 'epoch': 0.93} + + 93%|█████████▎| 6867/7378 [23:32:31<1:44:10, 12.23s/it] + 93%|█████████▎| 6868/7378 [23:32:44<1:45:30, 12.41s/it] + +{'loss': 0.4276, 'learning_rate': 2.496056474165687e-07, 'epoch': 0.93} + + 93%|█████████▎| 6868/7378 [23:32:44<1:45:30, 12.41s/it] + 93%|█████████▎| 6869/7378 [23:32:56<1:44:14, 12.29s/it] + +{'loss': 0.4366, 'learning_rate': 2.4863183395425816e-07, 'epoch': 0.93} + + 93%|█████████▎| 6869/7378 [23:32:56<1:44:14, 12.29s/it] + 93%|█████████▎| 6870/7378 [23:33:08<1:44:39, 12.36s/it] + +{'loss': 0.4565, 'learning_rate': 2.4765989991534344e-07, 'epoch': 0.93} + + 93%|█████████▎| 6870/7378 [23:33:08<1:44:39, 12.36s/it] + 93%|█████████▎| 6871/7378 [23:33:20<1:42:55, 12.18s/it] + +{'loss': 0.4079, 'learning_rate': 2.466898454871469e-07, 'epoch': 0.93} + + 93%|█████████▎| 6871/7378 [23:33:20<1:42:55, 12.18s/it] + 93%|█████████▎| 6872/7378 [23:33:33<1:43:34, 12.28s/it] + +{'loss': 0.4096, 'learning_rate': 2.4572167085663124e-07, 'epoch': 0.93} + + 93%|█████████▎| 6872/7378 [23:33:33<1:43:34, 12.28s/it] + 93%|█████████▎| 6873/7378 [23:33:45<1:42:44, 12.21s/it] + +{'loss': 0.4111, 'learning_rate': 2.4475537621039715e-07, 'epoch': 0.93} + + 93%|█████████▎| 6873/7378 [23:33:45<1:42:44, 12.21s/it] + 93%|█████████▎| 6874/7378 [23:33:57<1:43:19, 12.30s/it] + +{'loss': 0.4334, 'learning_rate': 2.4379096173468343e-07, 'epoch': 0.93} + + 93%|█████████▎| 6874/7378 [23:33:57<1:43:19, 12.30s/it] + 93%|█████████▎| 6875/7378 [23:34:09<1:42:46, 12.26s/it] + +{'loss': 0.3782, 'learning_rate': 2.4282842761536586e-07, 'epoch': 0.93} + + 93%|█████████▎| 6875/7378 [23:34:09<1:42:46, 12.26s/it] + 93%|█████████▎| 6876/7378 [23:34:22<1:42:32, 12.26s/it] + +{'loss': 0.4838, 'learning_rate': 2.4186777403795714e-07, 'epoch': 0.93} + + 93%|█████████▎| 6876/7378 [23:34:22<1:42:32, 12.26s/it] + 93%|█████████▎| 6877/7378 [23:34:34<1:41:39, 12.17s/it] + +{'loss': 0.4307, 'learning_rate': 2.409090011876081e-07, 'epoch': 0.93} + + 93%|█████████▎| 6877/7378 [23:34:34<1:41:39, 12.17s/it] + 93%|█████████▎| 6878/7378 [23:34:46<1:41:08, 12.14s/it] + +{'loss': 0.4359, 'learning_rate': 2.399521092491075e-07, 'epoch': 0.93} + + 93%|█████████▎| 6878/7378 [23:34:46<1:41:08, 12.14s/it] + 93%|█████████▎| 6879/7378 [23:34:58<1:41:04, 12.15s/it] + +{'loss': 0.3882, 'learning_rate': 2.3899709840688124e-07, 'epoch': 0.93} + + 93%|█████████▎| 6879/7378 [23:34:58<1:41:04, 12.15s/it] + 93%|█████████▎| 6880/7378 [23:35:10<1:40:51, 12.15s/it] + +{'loss': 0.3488, 'learning_rate': 2.3804396884499313e-07, 'epoch': 0.93} + + 93%|█████████▎| 6880/7378 [23:35:10<1:40:51, 12.15s/it] + 93%|█████████▎| 6881/7378 [23:35:22<1:40:08, 12.09s/it] + +{'loss': 0.3993, 'learning_rate': 2.3709272074714408e-07, 'epoch': 0.93} + + 93%|█████████▎| 6881/7378 [23:35:22<1:40:08, 12.09s/it] + 93%|█████████▎| 6882/7378 [23:35:34<1:40:15, 12.13s/it] + +{'loss': 0.4508, 'learning_rate': 2.361433542966718e-07, 'epoch': 0.93} + + 93%|█████████▎| 6882/7378 [23:35:34<1:40:15, 12.13s/it] + 93%|█████████▎| 6883/7378 [23:35:46<1:40:20, 12.16s/it] + +{'loss': 0.4432, 'learning_rate': 2.3519586967655217e-07, 'epoch': 0.93} + + 93%|█████████▎| 6883/7378 [23:35:46<1:40:20, 12.16s/it] + 93%|█████████▎| 6884/7378 [23:35:59<1:39:56, 12.14s/it] + +{'loss': 0.4562, 'learning_rate': 2.3425026706939692e-07, 'epoch': 0.93} + + 93%|█████████▎| 6884/7378 [23:35:59<1:39:56, 12.14s/it] + 93%|█████████▎| 6885/7378 [23:36:11<1:39:36, 12.12s/it] + +{'loss': 0.4147, 'learning_rate': 2.333065466574569e-07, 'epoch': 0.93} + + 93%|█████████▎| 6885/7378 [23:36:11<1:39:36, 12.12s/it] + 93%|█████████▎| 6886/7378 [23:36:23<1:39:52, 12.18s/it] + +{'loss': 0.3854, 'learning_rate': 2.3236470862261996e-07, 'epoch': 0.93} + + 93%|█████████▎| 6886/7378 [23:36:23<1:39:52, 12.18s/it] + 93%|█████████▎| 6887/7378 [23:36:35<1:38:51, 12.08s/it] + +{'loss': 0.4099, 'learning_rate': 2.3142475314640867e-07, 'epoch': 0.93} + + 93%|█████████▎| 6887/7378 [23:36:35<1:38:51, 12.08s/it] + 93%|█████████▎| 6888/7378 [23:36:47<1:39:16, 12.16s/it] + +{'loss': 0.3667, 'learning_rate': 2.30486680409987e-07, 'epoch': 0.93} + + 93%|█████████▎| 6888/7378 [23:36:47<1:39:16, 12.16s/it] + 93%|█████████▎| 6889/7378 [23:37:00<1:40:26, 12.32s/it] + +{'loss': 0.4285, 'learning_rate': 2.2955049059415258e-07, 'epoch': 0.93} + + 93%|█████████▎| 6889/7378 [23:37:00<1:40:26, 12.32s/it] + 93%|█████████▎| 6890/7378 [23:37:12<1:40:43, 12.38s/it] + +{'loss': 0.4398, 'learning_rate': 2.2861618387934213e-07, 'epoch': 0.93} + + 93%|█████████▎| 6890/7378 [23:37:12<1:40:43, 12.38s/it] + 93%|█████████▎| 6891/7378 [23:37:25<1:39:58, 12.32s/it] + +{'loss': 0.4183, 'learning_rate': 2.2768376044562834e-07, 'epoch': 0.93} + + 93%|█████████▎| 6891/7378 [23:37:25<1:39:58, 12.32s/it] + 93%|█████████▎| 6892/7378 [23:37:37<1:39:21, 12.27s/it] + +{'loss': 0.407, 'learning_rate': 2.2675322047271963e-07, 'epoch': 0.93} + + 93%|█████████▎| 6892/7378 [23:37:37<1:39:21, 12.27s/it] + 93%|█████████▎| 6893/7378 [23:37:49<1:39:58, 12.37s/it] + +{'loss': 0.4079, 'learning_rate': 2.258245641399648e-07, 'epoch': 0.93} + + 93%|█████████▎| 6893/7378 [23:37:49<1:39:58, 12.37s/it] + 93%|█████████▎| 6894/7378 [23:38:02<1:39:52, 12.38s/it] + +{'loss': 0.4786, 'learning_rate': 2.248977916263473e-07, 'epoch': 0.93} + + 93%|█████████▎| 6894/7378 [23:38:02<1:39:52, 12.38s/it] + 93%|█████████▎| 6895/7378 [23:38:14<1:39:52, 12.41s/it] + +{'loss': 0.404, 'learning_rate': 2.2397290311048868e-07, 'epoch': 0.93} + + 93%|█████████▎| 6895/7378 [23:38:14<1:39:52, 12.41s/it] + 93%|█████████▎| 6896/7378 [23:38:27<1:39:56, 12.44s/it] + +{'loss': 0.4567, 'learning_rate': 2.2304989877064643e-07, 'epoch': 0.93} + + 93%|█████████▎| 6896/7378 [23:38:27<1:39:56, 12.44s/it] + 93%|█████████▎| 6897/7378 [23:38:39<1:39:00, 12.35s/it] + +{'loss': 0.4284, 'learning_rate': 2.2212877878471372e-07, 'epoch': 0.93} + + 93%|█████████▎| 6897/7378 [23:38:39<1:39:00, 12.35s/it] + 93%|█████████▎| 6898/7378 [23:38:52<1:40:21, 12.54s/it] + +{'loss': 0.4554, 'learning_rate': 2.2120954333022304e-07, 'epoch': 0.93} + + 93%|█████████▎| 6898/7378 [23:38:52<1:40:21, 12.54s/it] + 94%|█████████▎| 6899/7378 [23:39:04<1:39:03, 12.41s/it] + +{'loss': 0.4266, 'learning_rate': 2.2029219258434376e-07, 'epoch': 0.94} + + 94%|█████████▎| 6899/7378 [23:39:04<1:39:03, 12.41s/it] + 94%|█████████▎| 6900/7378 [23:39:16<1:39:04, 12.44s/it] + +{'loss': 0.4377, 'learning_rate': 2.193767267238789e-07, 'epoch': 0.94} + + 94%|█████████▎| 6900/7378 [23:39:16<1:39:04, 12.44s/it] + 94%|█████████▎| 6901/7378 [23:39:28<1:38:03, 12.33s/it] + +{'loss': 0.5002, 'learning_rate': 2.1846314592527172e-07, 'epoch': 0.94} + + 94%|█████████▎| 6901/7378 [23:39:29<1:38:03, 12.33s/it] + 94%|█████████▎| 6902/7378 [23:39:40<1:36:48, 12.20s/it] + +{'loss': 0.4319, 'learning_rate': 2.1755145036459814e-07, 'epoch': 0.94} + + 94%|█████████▎| 6902/7378 [23:39:40<1:36:48, 12.20s/it] + 94%|█████████▎| 6903/7378 [23:39:53<1:36:49, 12.23s/it] + +{'loss': 0.4424, 'learning_rate': 2.1664164021757638e-07, 'epoch': 0.94} + + 94%|█████████▎| 6903/7378 [23:39:53<1:36:49, 12.23s/it] + 94%|█████████▎| 6904/7378 [23:40:05<1:35:56, 12.14s/it] + +{'loss': 0.4141, 'learning_rate': 2.1573371565955736e-07, 'epoch': 0.94} + + 94%|█████████▎| 6904/7378 [23:40:05<1:35:56, 12.14s/it] + 94%|█████████▎| 6905/7378 [23:40:17<1:35:07, 12.07s/it] + +{'loss': 0.4018, 'learning_rate': 2.1482767686552774e-07, 'epoch': 0.94} + + 94%|█████████▎| 6905/7378 [23:40:17<1:35:07, 12.07s/it] + 94%|█████████▎| 6906/7378 [23:40:29<1:35:06, 12.09s/it] + +{'loss': 0.3983, 'learning_rate': 2.139235240101134e-07, 'epoch': 0.94} + + 94%|█████████▎| 6906/7378 [23:40:29<1:35:06, 12.09s/it] + 94%|█████████▎| 6907/7378 [23:40:41<1:35:30, 12.17s/it] + +{'loss': 0.473, 'learning_rate': 2.1302125726757383e-07, 'epoch': 0.94} + + 94%|█████████▎| 6907/7378 [23:40:41<1:35:30, 12.17s/it] + 94%|█████████▎| 6908/7378 [23:40:53<1:35:10, 12.15s/it] + +{'loss': 0.4175, 'learning_rate': 2.1212087681180993e-07, 'epoch': 0.94} + + 94%|█████████▎| 6908/7378 [23:40:53<1:35:10, 12.15s/it] + 94%|█████████▎| 6909/7378 [23:41:05<1:35:07, 12.17s/it] + +{'loss': 0.4107, 'learning_rate': 2.112223828163551e-07, 'epoch': 0.94} + + 94%|█████████▎| 6909/7378 [23:41:05<1:35:07, 12.17s/it] + 94%|█████████▎| 6910/7378 [23:41:18<1:36:01, 12.31s/it] + +{'loss': 0.4447, 'learning_rate': 2.103257754543786e-07, 'epoch': 0.94} + + 94%|█████████▎| 6910/7378 [23:41:18<1:36:01, 12.31s/it] + 94%|█████████▎| 6911/7378 [23:41:30<1:35:42, 12.30s/it] + +{'loss': 0.4246, 'learning_rate': 2.0943105489868666e-07, 'epoch': 0.94} + + 94%|█████████▎| 6911/7378 [23:41:30<1:35:42, 12.30s/it] + 94%|█████████▎| 6912/7378 [23:41:43<1:37:25, 12.55s/it] + +{'loss': 0.4526, 'learning_rate': 2.0853822132172574e-07, 'epoch': 0.94} + + 94%|█████████▎| 6912/7378 [23:41:43<1:37:25, 12.55s/it] + 94%|█████████▎| 6913/7378 [23:41:56<1:37:33, 12.59s/it] + +{'loss': 0.456, 'learning_rate': 2.076472748955727e-07, 'epoch': 0.94} + + 94%|█████████▎| 6913/7378 [23:41:56<1:37:33, 12.59s/it] + 94%|█████████▎| 6914/7378 [23:42:08<1:36:08, 12.43s/it] + +{'loss': 0.386, 'learning_rate': 2.0675821579194567e-07, 'epoch': 0.94} + + 94%|█████████▎| 6914/7378 [23:42:08<1:36:08, 12.43s/it] + 94%|█████████▎| 6915/7378 [23:42:20<1:35:46, 12.41s/it] + +{'loss': 0.4003, 'learning_rate': 2.058710441821954e-07, 'epoch': 0.94} + + 94%|█████████▎| 6915/7378 [23:42:20<1:35:46, 12.41s/it] + 94%|█████████▎| 6916/7378 [23:42:33<1:36:10, 12.49s/it] + +{'loss': 0.4064, 'learning_rate': 2.0498576023731064e-07, 'epoch': 0.94} + + 94%|█████████▎| 6916/7378 [23:42:33<1:36:10, 12.49s/it] + 94%|█████████▍| 6917/7378 [23:42:46<1:35:43, 12.46s/it] + +{'loss': 0.434, 'learning_rate': 2.0410236412791606e-07, 'epoch': 0.94} + + 94%|█████████▍| 6917/7378 [23:42:46<1:35:43, 12.46s/it] + 94%|█████████▍| 6918/7378 [23:42:58<1:35:25, 12.45s/it] + +{'loss': 0.4334, 'learning_rate': 2.032208560242732e-07, 'epoch': 0.94} + + 94%|█████████▍| 6918/7378 [23:42:58<1:35:25, 12.45s/it] + 94%|█████████▍| 6919/7378 [23:43:10<1:35:08, 12.44s/it] + +{'loss': 0.4348, 'learning_rate': 2.0234123609627732e-07, 'epoch': 0.94} + + 94%|█████████▍| 6919/7378 [23:43:10<1:35:08, 12.44s/it] + 94%|█████████▍| 6920/7378 [23:43:23<1:35:00, 12.45s/it] + +{'loss': 0.415, 'learning_rate': 2.0146350451346275e-07, 'epoch': 0.94} + + 94%|█████████▍| 6920/7378 [23:43:23<1:35:00, 12.45s/it] + 94%|█████████▍| 6921/7378 [23:43:35<1:34:48, 12.45s/it] + +{'loss': 0.437, 'learning_rate': 2.0058766144499642e-07, 'epoch': 0.94} + + 94%|█████████▍| 6921/7378 [23:43:35<1:34:48, 12.45s/it] + 94%|█████████▍| 6922/7378 [23:43:48<1:34:35, 12.45s/it] + +{'loss': 0.4094, 'learning_rate': 1.9971370705968663e-07, 'epoch': 0.94} + + 94%|█████████▍| 6922/7378 [23:43:48<1:34:35, 12.45s/it] + 94%|█████████▍| 6923/7378 [23:44:00<1:34:51, 12.51s/it] + +{'loss': 0.3943, 'learning_rate': 1.9884164152597307e-07, 'epoch': 0.94} + + 94%|█████████▍| 6923/7378 [23:44:00<1:34:51, 12.51s/it] + 94%|█████████▍| 6924/7378 [23:44:12<1:33:30, 12.36s/it] + +{'loss': 0.4223, 'learning_rate': 1.9797146501193243e-07, 'epoch': 0.94} + + 94%|█████████▍| 6924/7378 [23:44:12<1:33:30, 12.36s/it] + 94%|█████████▍| 6925/7378 [23:44:24<1:32:13, 12.22s/it] + +{'loss': 0.4442, 'learning_rate': 1.971031776852772e-07, 'epoch': 0.94} + + 94%|█████████▍| 6925/7378 [23:44:24<1:32:13, 12.22s/it] + 94%|█████████▍| 6926/7378 [23:44:37<1:32:11, 12.24s/it] + +{'loss': 0.4557, 'learning_rate': 1.9623677971335464e-07, 'epoch': 0.94} + + 94%|█████████▍| 6926/7378 [23:44:37<1:32:11, 12.24s/it] + 94%|█████████▍| 6927/7378 [23:44:49<1:31:53, 12.22s/it] + +{'loss': 0.4345, 'learning_rate': 1.9537227126315338e-07, 'epoch': 0.94} + + 94%|█████████▍| 6927/7378 [23:44:49<1:31:53, 12.22s/it] + 94%|█████████▍| 6928/7378 [23:45:01<1:32:07, 12.28s/it] + +{'loss': 0.4039, 'learning_rate': 1.9450965250129127e-07, 'epoch': 0.94} + + 94%|█████████▍| 6928/7378 [23:45:01<1:32:07, 12.28s/it] + 94%|█████████▍| 6929/7378 [23:45:13<1:31:56, 12.29s/it] + +{'loss': 0.4753, 'learning_rate': 1.936489235940242e-07, 'epoch': 0.94} + + 94%|█████████▍| 6929/7378 [23:45:13<1:31:56, 12.29s/it] + 94%|█████████▍| 6930/7378 [23:45:26<1:31:37, 12.27s/it] + +{'loss': 0.4376, 'learning_rate': 1.9279008470724502e-07, 'epoch': 0.94} + + 94%|█████████▍| 6930/7378 [23:45:26<1:31:37, 12.27s/it] + 94%|█████████▍| 6931/7378 [23:45:39<1:32:39, 12.44s/it] + +{'loss': 0.4635, 'learning_rate': 1.9193313600648244e-07, 'epoch': 0.94} + + 94%|█████████▍| 6931/7378 [23:45:39<1:32:39, 12.44s/it] + 94%|█████████▍| 6932/7378 [23:45:51<1:32:02, 12.38s/it] + +{'loss': 0.4332, 'learning_rate': 1.910780776568977e-07, 'epoch': 0.94} + + 94%|█████████▍| 6932/7378 [23:45:51<1:32:02, 12.38s/it] + 94%|█████████▍| 6933/7378 [23:46:03<1:30:25, 12.19s/it] + +{'loss': 0.4084, 'learning_rate': 1.9022490982329221e-07, 'epoch': 0.94} + + 94%|█████████▍| 6933/7378 [23:46:03<1:30:25, 12.19s/it] + 94%|█████████▍| 6934/7378 [23:46:15<1:30:21, 12.21s/it] + +{'loss': 0.4518, 'learning_rate': 1.8937363267009901e-07, 'epoch': 0.94} + + 94%|█████████▍| 6934/7378 [23:46:15<1:30:21, 12.21s/it] + 94%|█████████▍| 6935/7378 [23:46:27<1:29:42, 12.15s/it] + +{'loss': 0.451, 'learning_rate': 1.88524246361389e-07, 'epoch': 0.94} + + 94%|█████████▍| 6935/7378 [23:46:27<1:29:42, 12.15s/it] + 94%|█████████▍| 6936/7378 [23:46:39<1:29:05, 12.09s/it] + +{'loss': 0.4257, 'learning_rate': 1.876767510608679e-07, 'epoch': 0.94} + + 94%|█████████▍| 6936/7378 [23:46:39<1:29:05, 12.09s/it] + 94%|█████████▍| 6937/7378 [23:46:51<1:28:42, 12.07s/it] + +{'loss': 0.4919, 'learning_rate': 1.8683114693187731e-07, 'epoch': 0.94} + + 94%|█████████▍| 6937/7378 [23:46:51<1:28:42, 12.07s/it] + 94%|█████████▍| 6938/7378 [23:47:03<1:29:02, 12.14s/it] + +{'loss': 0.4208, 'learning_rate': 1.8598743413739462e-07, 'epoch': 0.94} + + 94%|█████████▍| 6938/7378 [23:47:03<1:29:02, 12.14s/it] + 94%|█████████▍| 6939/7378 [23:47:15<1:28:50, 12.14s/it] + +{'loss': 0.4141, 'learning_rate': 1.8514561284003085e-07, 'epoch': 0.94} + + 94%|█████████▍| 6939/7378 [23:47:15<1:28:50, 12.14s/it] + 94%|█████████▍| 6940/7378 [23:47:27<1:28:37, 12.14s/it] + +{'loss': 0.4533, 'learning_rate': 1.8430568320203512e-07, 'epoch': 0.94} + + 94%|█████████▍| 6940/7378 [23:47:27<1:28:37, 12.14s/it] + 94%|█████████▍| 6941/7378 [23:47:39<1:27:52, 12.06s/it] + +{'loss': 0.4381, 'learning_rate': 1.8346764538529127e-07, 'epoch': 0.94} + + 94%|█████████▍| 6941/7378 [23:47:39<1:27:52, 12.06s/it] + 94%|█████████▍| 6942/7378 [23:47:52<1:28:08, 12.13s/it] + +{'loss': 0.4056, 'learning_rate': 1.8263149955131564e-07, 'epoch': 0.94} + + 94%|█████████▍| 6942/7378 [23:47:52<1:28:08, 12.13s/it] + 94%|█████████▍| 6943/7378 [23:48:04<1:27:53, 12.12s/it] + +{'loss': 0.4047, 'learning_rate': 1.817972458612649e-07, 'epoch': 0.94} + + 94%|█████████▍| 6943/7378 [23:48:04<1:27:53, 12.12s/it] + 94%|█████████▍| 6944/7378 [23:48:16<1:28:01, 12.17s/it] + +{'loss': 0.3634, 'learning_rate': 1.8096488447592598e-07, 'epoch': 0.94} + + 94%|█████████▍| 6944/7378 [23:48:16<1:28:01, 12.17s/it] + 94%|█████████▍| 6945/7378 [23:48:28<1:28:40, 12.29s/it] + +{'loss': 0.4163, 'learning_rate': 1.8013441555572607e-07, 'epoch': 0.94} + + 94%|█████████▍| 6945/7378 [23:48:28<1:28:40, 12.29s/it] + 94%|█████████▍| 6946/7378 [23:48:41<1:28:35, 12.30s/it] + +{'loss': 0.473, 'learning_rate': 1.7930583926072275e-07, 'epoch': 0.94} + + 94%|█████████▍| 6946/7378 [23:48:41<1:28:35, 12.30s/it] + 94%|█████████▍| 6947/7378 [23:48:53<1:28:43, 12.35s/it] + +{'loss': 0.4138, 'learning_rate': 1.784791557506127e-07, 'epoch': 0.94} + + 94%|█████████▍| 6947/7378 [23:48:53<1:28:43, 12.35s/it] + 94%|█████████▍| 6948/7378 [23:49:05<1:27:59, 12.28s/it] + +{'loss': 0.4311, 'learning_rate': 1.776543651847251e-07, 'epoch': 0.94} + + 94%|█████████▍| 6948/7378 [23:49:05<1:27:59, 12.28s/it] + 94%|█████████▍| 6949/7378 [23:49:18<1:28:20, 12.36s/it] + +{'loss': 0.4853, 'learning_rate': 1.7683146772202508e-07, 'epoch': 0.94} + + 94%|█████████▍| 6949/7378 [23:49:18<1:28:20, 12.36s/it] + 94%|█████████▍| 6950/7378 [23:49:30<1:28:00, 12.34s/it] + +{'loss': 0.4089, 'learning_rate': 1.760104635211146e-07, 'epoch': 0.94} + + 94%|█████████▍| 6950/7378 [23:49:30<1:28:00, 12.34s/it] + 94%|█████████▍| 6951/7378 [23:49:42<1:27:27, 12.29s/it] + +{'loss': 0.4187, 'learning_rate': 1.7519135274022824e-07, 'epoch': 0.94} + + 94%|█████████▍| 6951/7378 [23:49:42<1:27:27, 12.29s/it] + 94%|█████████▍| 6952/7378 [23:49:55<1:27:09, 12.28s/it] + +{'loss': 0.4151, 'learning_rate': 1.7437413553723749e-07, 'epoch': 0.94} + + 94%|█████████▍| 6952/7378 [23:49:55<1:27:09, 12.28s/it] + 94%|█████████▍| 6953/7378 [23:50:07<1:26:38, 12.23s/it] + +{'loss': 0.3772, 'learning_rate': 1.7355881206964742e-07, 'epoch': 0.94} + + 94%|█████████▍| 6953/7378 [23:50:07<1:26:38, 12.23s/it] + 94%|█████████▍| 6954/7378 [23:50:19<1:26:46, 12.28s/it] + +{'loss': 0.4133, 'learning_rate': 1.7274538249460015e-07, 'epoch': 0.94} + + 94%|█████████▍| 6954/7378 [23:50:19<1:26:46, 12.28s/it] + 94%|█████████▍| 6955/7378 [23:50:32<1:27:51, 12.46s/it] + +{'loss': 0.4599, 'learning_rate': 1.719338469688714e-07, 'epoch': 0.94} + + 94%|█████████▍| 6955/7378 [23:50:32<1:27:51, 12.46s/it] + 94%|█████████▍| 6956/7378 [23:50:44<1:26:49, 12.35s/it] + +{'loss': 0.4092, 'learning_rate': 1.7112420564887046e-07, 'epoch': 0.94} + + 94%|█████████▍| 6956/7378 [23:50:44<1:26:49, 12.35s/it] + 94%|█████████▍| 6957/7378 [23:50:56<1:26:03, 12.26s/it] + +{'loss': 0.4045, 'learning_rate': 1.703164586906436e-07, 'epoch': 0.94} + + 94%|█████████▍| 6957/7378 [23:50:56<1:26:03, 12.26s/it] + 94%|█████████▍| 6958/7378 [23:51:09<1:26:50, 12.41s/it] + +{'loss': 0.3755, 'learning_rate': 1.6951060624987082e-07, 'epoch': 0.94} + + 94%|█████████▍| 6958/7378 [23:51:09<1:26:50, 12.41s/it] + 94%|█████████▍| 6959/7378 [23:51:21<1:26:45, 12.42s/it] + +{'loss': 0.4693, 'learning_rate': 1.6870664848186891e-07, 'epoch': 0.94} + + 94%|█████████▍| 6959/7378 [23:51:21<1:26:45, 12.42s/it] + 94%|█████████▍| 6960/7378 [23:51:35<1:28:02, 12.64s/it] + +{'loss': 0.4261, 'learning_rate': 1.6790458554158728e-07, 'epoch': 0.94} + + 94%|█████████▍| 6960/7378 [23:51:35<1:28:02, 12.64s/it] + 94%|█████████▍| 6961/7378 [23:51:47<1:27:48, 12.63s/it] + +{'loss': 0.4159, 'learning_rate': 1.6710441758361117e-07, 'epoch': 0.94} + + 94%|█████████▍| 6961/7378 [23:51:47<1:27:48, 12.63s/it] + 94%|█████████▍| 6962/7378 [23:51:59<1:26:31, 12.48s/it] + +{'loss': 0.437, 'learning_rate': 1.6630614476216056e-07, 'epoch': 0.94} + + 94%|█████████▍| 6962/7378 [23:51:59<1:26:31, 12.48s/it] + 94%|█████████▍| 6963/7378 [23:52:12<1:26:10, 12.46s/it] + +{'loss': 0.4261, 'learning_rate': 1.6550976723109013e-07, 'epoch': 0.94} + + 94%|█████████▍| 6963/7378 [23:52:12<1:26:10, 12.46s/it] + 94%|█████████▍| 6964/7378 [23:52:24<1:25:26, 12.38s/it] + +{'loss': 0.4482, 'learning_rate': 1.6471528514388824e-07, 'epoch': 0.94} + + 94%|█████████▍| 6964/7378 [23:52:24<1:25:26, 12.38s/it] + 94%|█████████▍| 6965/7378 [23:52:37<1:26:02, 12.50s/it] + +{'loss': 0.4791, 'learning_rate': 1.6392269865368015e-07, 'epoch': 0.94} + + 94%|█████████▍| 6965/7378 [23:52:37<1:26:02, 12.50s/it] + 94%|█████████▍| 6966/7378 [23:52:49<1:25:11, 12.41s/it] + +{'loss': 0.4484, 'learning_rate': 1.631320079132237e-07, 'epoch': 0.94} + + 94%|█████████▍| 6966/7378 [23:52:49<1:25:11, 12.41s/it] + 94%|█████████▍| 6967/7378 [23:53:02<1:25:58, 12.55s/it] + +{'loss': 0.425, 'learning_rate': 1.623432130749125e-07, 'epoch': 0.94} + + 94%|█████████▍| 6967/7378 [23:53:02<1:25:58, 12.55s/it] + 94%|█████████▍| 6968/7378 [23:53:14<1:25:37, 12.53s/it] + +{'loss': 0.4052, 'learning_rate': 1.6155631429077389e-07, 'epoch': 0.94} + + 94%|█████████▍| 6968/7378 [23:53:14<1:25:37, 12.53s/it] + 94%|█████████▍| 6969/7378 [23:53:27<1:25:45, 12.58s/it] + +{'loss': 0.4164, 'learning_rate': 1.6077131171247096e-07, 'epoch': 0.94} + + 94%|█████████▍| 6969/7378 [23:53:27<1:25:45, 12.58s/it] + 94%|█████████▍| 6970/7378 [23:53:40<1:25:58, 12.64s/it] + +{'loss': 0.4247, 'learning_rate': 1.5998820549130046e-07, 'epoch': 0.94} + + 94%|█████████▍| 6970/7378 [23:53:40<1:25:58, 12.64s/it] + 94%|█████████▍| 6971/7378 [23:53:52<1:25:37, 12.62s/it] + +{'loss': 0.4231, 'learning_rate': 1.5920699577819388e-07, 'epoch': 0.94} + + 94%|█████████▍| 6971/7378 [23:53:52<1:25:37, 12.62s/it] + 94%|█████████▍| 6972/7378 [23:54:05<1:25:10, 12.59s/it] + +{'loss': 0.4203, 'learning_rate': 1.5842768272371523e-07, 'epoch': 0.94} + + 94%|█████████▍| 6972/7378 [23:54:05<1:25:10, 12.59s/it] + 95%|█████████▍| 6973/7378 [23:54:17<1:24:09, 12.47s/it] + +{'loss': 0.4496, 'learning_rate': 1.576502664780688e-07, 'epoch': 0.95} + + 95%|█████████▍| 6973/7378 [23:54:17<1:24:09, 12.47s/it] + 95%|█████████▍| 6974/7378 [23:54:29<1:23:05, 12.34s/it] + +{'loss': 0.4485, 'learning_rate': 1.5687474719108586e-07, 'epoch': 0.95} + + 95%|█████████▍| 6974/7378 [23:54:29<1:23:05, 12.34s/it] + 95%|█████████▍| 6975/7378 [23:54:41<1:22:40, 12.31s/it] + +{'loss': 0.4378, 'learning_rate': 1.5610112501223796e-07, 'epoch': 0.95} + + 95%|█████████▍| 6975/7378 [23:54:41<1:22:40, 12.31s/it] + 95%|█████████▍| 6976/7378 [23:54:53<1:22:11, 12.27s/it] + +{'loss': 0.4538, 'learning_rate': 1.553294000906269e-07, 'epoch': 0.95} + + 95%|█████████▍| 6976/7378 [23:54:53<1:22:11, 12.27s/it] + 95%|█████████▍| 6977/7378 [23:55:06<1:21:34, 12.21s/it] + +{'loss': 0.4165, 'learning_rate': 1.5455957257499043e-07, 'epoch': 0.95} + + 95%|█████████▍| 6977/7378 [23:55:06<1:21:34, 12.21s/it] + 95%|█████████▍| 6978/7378 [23:55:18<1:21:22, 12.21s/it] + +{'loss': 0.447, 'learning_rate': 1.5379164261370317e-07, 'epoch': 0.95} + + 95%|█████████▍| 6978/7378 [23:55:18<1:21:22, 12.21s/it] + 95%|█████████▍| 6979/7378 [23:55:30<1:21:06, 12.20s/it] + +{'loss': 0.4403, 'learning_rate': 1.5302561035477003e-07, 'epoch': 0.95} + + 95%|█████████▍| 6979/7378 [23:55:30<1:21:06, 12.20s/it] + 95%|█████████▍| 6980/7378 [23:55:42<1:20:37, 12.16s/it] + +{'loss': 0.4143, 'learning_rate': 1.5226147594583073e-07, 'epoch': 0.95} + + 95%|█████████▍| 6980/7378 [23:55:42<1:20:37, 12.16s/it] + 95%|█████████▍| 6981/7378 [23:55:54<1:20:22, 12.15s/it] + +{'loss': 0.3719, 'learning_rate': 1.5149923953416078e-07, 'epoch': 0.95} + + 95%|█████████▍| 6981/7378 [23:55:54<1:20:22, 12.15s/it] + 95%|█████████▍| 6982/7378 [23:56:06<1:20:07, 12.14s/it] + +{'loss': 0.474, 'learning_rate': 1.5073890126667156e-07, 'epoch': 0.95} + + 95%|█████████▍| 6982/7378 [23:56:06<1:20:07, 12.14s/it] + 95%|█████████▍| 6983/7378 [23:56:18<1:19:54, 12.14s/it] + +{'loss': 0.3986, 'learning_rate': 1.4998046128990362e-07, 'epoch': 0.95} + + 95%|█████████▍| 6983/7378 [23:56:18<1:19:54, 12.14s/it] + 95%|█████████▍| 6984/7378 [23:56:30<1:19:08, 12.05s/it] + +{'loss': 0.3888, 'learning_rate': 1.492239197500356e-07, 'epoch': 0.95} + + 95%|█████████▍| 6984/7378 [23:56:30<1:19:08, 12.05s/it] + 95%|█████████▍| 6985/7378 [23:56:42<1:18:33, 11.99s/it] + +{'loss': 0.43, 'learning_rate': 1.4846927679287747e-07, 'epoch': 0.95} + + 95%|█████████▍| 6985/7378 [23:56:42<1:18:33, 11.99s/it] + 95%|█████████▍| 6986/7378 [23:56:54<1:19:01, 12.10s/it] + +{'loss': 0.4152, 'learning_rate': 1.477165325638763e-07, 'epoch': 0.95} + + 95%|█████████▍| 6986/7378 [23:56:54<1:19:01, 12.10s/it] + 95%|█████████▍| 6987/7378 [23:57:07<1:19:19, 12.17s/it] + +{'loss': 0.4191, 'learning_rate': 1.4696568720811266e-07, 'epoch': 0.95} + + 95%|█████████▍| 6987/7378 [23:57:07<1:19:19, 12.17s/it] + 95%|█████████▍| 6988/7378 [23:57:19<1:19:42, 12.26s/it] + +{'loss': 0.4125, 'learning_rate': 1.462167408702986e-07, 'epoch': 0.95} + + 95%|█████████▍| 6988/7378 [23:57:19<1:19:42, 12.26s/it] + 95%|█████████▍| 6989/7378 [23:57:31<1:19:09, 12.21s/it] + +{'loss': 0.3683, 'learning_rate': 1.4546969369478191e-07, 'epoch': 0.95} + + 95%|█████████▍| 6989/7378 [23:57:31<1:19:09, 12.21s/it] + 95%|█████████▍| 6990/7378 [23:57:44<1:19:46, 12.34s/it] + +{'loss': 0.4396, 'learning_rate': 1.4472454582554418e-07, 'epoch': 0.95} + + 95%|█████████▍| 6990/7378 [23:57:44<1:19:46, 12.34s/it] + 95%|█████████▍| 6991/7378 [23:57:56<1:19:28, 12.32s/it] + +{'loss': 0.4333, 'learning_rate': 1.439812974062016e-07, 'epoch': 0.95} + + 95%|█████████▍| 6991/7378 [23:57:56<1:19:28, 12.32s/it] + 95%|█████████▍| 6992/7378 [23:58:09<1:19:18, 12.33s/it] + +{'loss': 0.4182, 'learning_rate': 1.43239948580004e-07, 'epoch': 0.95} + + 95%|█████████▍| 6992/7378 [23:58:09<1:19:18, 12.33s/it] + 95%|█████████▍| 6993/7378 [23:58:21<1:19:28, 12.38s/it] + +{'loss': 0.4353, 'learning_rate': 1.4250049948983491e-07, 'epoch': 0.95} + + 95%|█████████▍| 6993/7378 [23:58:21<1:19:28, 12.38s/it] + 95%|█████████▍| 6994/7378 [23:58:33<1:18:29, 12.26s/it] + +{'loss': 0.4178, 'learning_rate': 1.417629502782092e-07, 'epoch': 0.95} + + 95%|█████████▍| 6994/7378 [23:58:33<1:18:29, 12.26s/it] + 95%|█████████▍| 6995/7378 [23:58:45<1:18:35, 12.31s/it] + +{'loss': 0.4264, 'learning_rate': 1.410273010872798e-07, 'epoch': 0.95} + + 95%|█████████▍| 6995/7378 [23:58:45<1:18:35, 12.31s/it] + 95%|█████████▍| 6996/7378 [23:58:58<1:18:21, 12.31s/it] + +{'loss': 0.5116, 'learning_rate': 1.4029355205883222e-07, 'epoch': 0.95} + + 95%|█████████▍| 6996/7378 [23:58:58<1:18:21, 12.31s/it] + 95%|█████████▍| 6997/7378 [23:59:10<1:17:34, 12.22s/it] + +{'loss': 0.4282, 'learning_rate': 1.3956170333428332e-07, 'epoch': 0.95} + + 95%|█████████▍| 6997/7378 [23:59:10<1:17:34, 12.22s/it] + 95%|█████████▍| 6998/7378 [23:59:22<1:17:04, 12.17s/it] + +{'loss': 0.4857, 'learning_rate': 1.3883175505468693e-07, 'epoch': 0.95} + + 95%|█████████▍| 6998/7378 [23:59:22<1:17:04, 12.17s/it] + 95%|█████████▍| 6999/7378 [23:59:34<1:17:14, 12.23s/it] + +{'loss': 0.4154, 'learning_rate': 1.381037073607272e-07, 'epoch': 0.95} + + 95%|█████████▍| 6999/7378 [23:59:34<1:17:14, 12.23s/it] + 95%|█████████▍| 7000/7378 [23:59:47<1:17:16, 12.27s/it] + +{'loss': 0.5183, 'learning_rate': 1.3737756039272632e-07, 'epoch': 0.95} + + 95%|█████████▍| 7000/7378 [23:59:47<1:17:16, 12.27s/it] + 95%|█████████▍| 7001/7378 [23:59:59<1:16:53, 12.24s/it] + +{'loss': 0.4983, 'learning_rate': 1.3665331429063678e-07, 'epoch': 0.95} + + 95%|█████████▍| 7001/7378 [23:59:59<1:16:53, 12.24s/it] + 95%|█████████▍| 7002/7378 [24:00:11<1:16:59, 12.28s/it] + +{'loss': 0.4195, 'learning_rate': 1.359309691940458e-07, 'epoch': 0.95} + + 95%|█████████▍| 7002/7378 [24:00:11<1:16:59, 12.28s/it] + 95%|█████████▍| 7003/7378 [24:00:23<1:16:51, 12.30s/it] + +{'loss': 0.4578, 'learning_rate': 1.3521052524217315e-07, 'epoch': 0.95} + + 95%|█████████▍| 7003/7378 [24:00:23<1:16:51, 12.30s/it] + 95%|█████████▍| 7004/7378 [24:00:35<1:16:04, 12.21s/it] + +{'loss': 0.4025, 'learning_rate': 1.344919825738733e-07, 'epoch': 0.95} + + 95%|█████████▍| 7004/7378 [24:00:35<1:16:04, 12.21s/it] + 95%|█████████▍| 7005/7378 [24:00:48<1:16:17, 12.27s/it] + +{'loss': 0.4723, 'learning_rate': 1.3377534132763548e-07, 'epoch': 0.95} + + 95%|█████████▍| 7005/7378 [24:00:48<1:16:17, 12.27s/it] + 95%|█████████▍| 7006/7378 [24:01:00<1:15:41, 12.21s/it] + +{'loss': 0.4598, 'learning_rate': 1.330606016415803e-07, 'epoch': 0.95} + + 95%|█████████▍| 7006/7378 [24:01:00<1:15:41, 12.21s/it] + 95%|█████████▍| 7007/7378 [24:01:12<1:15:25, 12.20s/it] + +{'loss': 0.4361, 'learning_rate': 1.3234776365346313e-07, 'epoch': 0.95} + + 95%|█████████▍| 7007/7378 [24:01:12<1:15:25, 12.20s/it] + 95%|█████████▍| 7008/7378 [24:01:25<1:15:45, 12.28s/it] + +{'loss': 0.4335, 'learning_rate': 1.3163682750066964e-07, 'epoch': 0.95} + + 95%|█████████▍| 7008/7378 [24:01:25<1:15:45, 12.28s/it] + 95%|█████████▍| 7009/7378 [24:01:36<1:14:36, 12.13s/it] + +{'loss': 0.3599, 'learning_rate': 1.3092779332022465e-07, 'epoch': 0.95} + + 95%|█████████▍| 7009/7378 [24:01:36<1:14:36, 12.13s/it] + 95%|█████████▌| 7010/7378 [24:01:49<1:15:31, 12.31s/it] + +{'loss': 0.4488, 'learning_rate': 1.3022066124878218e-07, 'epoch': 0.95} + + 95%|█████████▌| 7010/7378 [24:01:49<1:15:31, 12.31s/it] + 95%|█████████▌| 7011/7378 [24:02:01<1:15:08, 12.28s/it] + +{'loss': 0.4491, 'learning_rate': 1.2951543142263101e-07, 'epoch': 0.95} + + 95%|█████████▌| 7011/7378 [24:02:01<1:15:08, 12.28s/it] + 95%|█████████▌| 7012/7378 [24:02:14<1:15:58, 12.45s/it] + +{'loss': 0.4708, 'learning_rate': 1.2881210397769461e-07, 'epoch': 0.95} + + 95%|█████████▌| 7012/7378 [24:02:14<1:15:58, 12.45s/it] + 95%|█████████▌| 7013/7378 [24:02:26<1:15:31, 12.41s/it] + +{'loss': 0.4393, 'learning_rate': 1.2811067904952567e-07, 'epoch': 0.95} + + 95%|█████████▌| 7013/7378 [24:02:26<1:15:31, 12.41s/it] + 95%|█████████▌| 7014/7378 [24:02:39<1:14:58, 12.36s/it] + +{'loss': 0.448, 'learning_rate': 1.2741115677331383e-07, 'epoch': 0.95} + + 95%|█████████▌| 7014/7378 [24:02:39<1:14:58, 12.36s/it] + 95%|█████████▌| 7015/7378 [24:02:51<1:14:06, 12.25s/it] + +{'loss': 0.4415, 'learning_rate': 1.2671353728388237e-07, 'epoch': 0.95} + + 95%|█████████▌| 7015/7378 [24:02:51<1:14:06, 12.25s/it] + 95%|█████████▌| 7016/7378 [24:03:03<1:14:01, 12.27s/it] + +{'loss': 0.4052, 'learning_rate': 1.260178207156848e-07, 'epoch': 0.95} + + 95%|█████████▌| 7016/7378 [24:03:03<1:14:01, 12.27s/it] + 95%|█████████▌| 7017/7378 [24:03:16<1:14:14, 12.34s/it] + +{'loss': 0.4149, 'learning_rate': 1.2532400720281057e-07, 'epoch': 0.95} + + 95%|█████████▌| 7017/7378 [24:03:16<1:14:14, 12.34s/it] + 95%|█████████▌| 7018/7378 [24:03:28<1:13:31, 12.25s/it] + +{'loss': 0.4516, 'learning_rate': 1.2463209687898047e-07, 'epoch': 0.95} + + 95%|█████████▌| 7018/7378 [24:03:28<1:13:31, 12.25s/it] + 95%|█████████▌| 7019/7378 [24:03:40<1:13:02, 12.21s/it] + +{'loss': 0.3874, 'learning_rate': 1.2394208987755008e-07, 'epoch': 0.95} + + 95%|█████████▌| 7019/7378 [24:03:40<1:13:02, 12.21s/it] + 95%|█████████▌| 7020/7378 [24:03:52<1:12:46, 12.20s/it] + +{'loss': 0.4216, 'learning_rate': 1.2325398633150742e-07, 'epoch': 0.95} + + 95%|█████████▌| 7020/7378 [24:03:52<1:12:46, 12.20s/it] + 95%|█████████▌| 7021/7378 [24:04:04<1:12:15, 12.14s/it] + +{'loss': 0.4452, 'learning_rate': 1.2256778637347422e-07, 'epoch': 0.95} + + 95%|█████████▌| 7021/7378 [24:04:04<1:12:15, 12.14s/it] + 95%|█████████▌| 7022/7378 [24:04:16<1:12:40, 12.25s/it] + +{'loss': 0.4555, 'learning_rate': 1.2188349013570356e-07, 'epoch': 0.95} + + 95%|█████████▌| 7022/7378 [24:04:16<1:12:40, 12.25s/it] + 95%|█████████▌| 7023/7378 [24:04:28<1:12:01, 12.17s/it] + +{'loss': 0.3893, 'learning_rate': 1.2120109775008215e-07, 'epoch': 0.95} + + 95%|█████████▌| 7023/7378 [24:04:28<1:12:01, 12.17s/it] + 95%|█████████▌| 7024/7378 [24:04:41<1:12:07, 12.22s/it] + +{'loss': 0.4438, 'learning_rate': 1.205206093481337e-07, 'epoch': 0.95} + + 95%|█████████▌| 7024/7378 [24:04:41<1:12:07, 12.22s/it] + 95%|█████████▌| 7025/7378 [24:04:53<1:12:26, 12.31s/it] + +{'loss': 0.4581, 'learning_rate': 1.1984202506100883e-07, 'epoch': 0.95} + + 95%|█████████▌| 7025/7378 [24:04:53<1:12:26, 12.31s/it] + 95%|█████████▌| 7026/7378 [24:05:05<1:12:07, 12.29s/it] + +{'loss': 0.4323, 'learning_rate': 1.1916534501949406e-07, 'epoch': 0.95} + + 95%|█████████▌| 7026/7378 [24:05:05<1:12:07, 12.29s/it] + 95%|█████████▌| 7027/7378 [24:05:18<1:11:41, 12.26s/it] + +{'loss': 0.4006, 'learning_rate': 1.184905693540106e-07, 'epoch': 0.95} + + 95%|█████████▌| 7027/7378 [24:05:18<1:11:41, 12.26s/it] + 95%|█████████▌| 7028/7378 [24:05:30<1:11:24, 12.24s/it] + +{'loss': 0.3659, 'learning_rate': 1.1781769819460887e-07, 'epoch': 0.95} + + 95%|█████████▌| 7028/7378 [24:05:30<1:11:24, 12.24s/it] + 95%|█████████▌| 7029/7378 [24:05:42<1:10:42, 12.16s/it] + +{'loss': 0.4417, 'learning_rate': 1.1714673167097624e-07, 'epoch': 0.95} + + 95%|█████████▌| 7029/7378 [24:05:42<1:10:42, 12.16s/it] + 95%|█████████▌| 7030/7378 [24:05:54<1:11:14, 12.28s/it] + +{'loss': 0.4493, 'learning_rate': 1.1647766991243037e-07, 'epoch': 0.95} + + 95%|█████████▌| 7030/7378 [24:05:54<1:11:14, 12.28s/it] + 95%|█████████▌| 7031/7378 [24:06:07<1:11:16, 12.32s/it] + +{'loss': 0.4244, 'learning_rate': 1.1581051304792146e-07, 'epoch': 0.95} + + 95%|█████████▌| 7031/7378 [24:06:07<1:11:16, 12.32s/it] + 95%|█████████▌| 7032/7378 [24:06:19<1:11:00, 12.31s/it] + +{'loss': 0.4033, 'learning_rate': 1.1514526120603331e-07, 'epoch': 0.95} + + 95%|█████████▌| 7032/7378 [24:06:19<1:11:00, 12.31s/it] + 95%|█████████▌| 7033/7378 [24:06:31<1:10:27, 12.25s/it] + +{'loss': 0.4641, 'learning_rate': 1.1448191451498448e-07, 'epoch': 0.95} + + 95%|█████████▌| 7033/7378 [24:06:31<1:10:27, 12.25s/it] + 95%|█████████▌| 7034/7378 [24:06:44<1:10:46, 12.35s/it] + +{'loss': 0.4421, 'learning_rate': 1.1382047310262379e-07, 'epoch': 0.95} + + 95%|█████████▌| 7034/7378 [24:06:44<1:10:46, 12.35s/it] + 95%|█████████▌| 7035/7378 [24:06:56<1:10:55, 12.41s/it] + +{'loss': 0.4113, 'learning_rate': 1.1316093709643372e-07, 'epoch': 0.95} + + 95%|█████████▌| 7035/7378 [24:06:56<1:10:55, 12.41s/it] + 95%|█████████▌| 7036/7378 [24:07:09<1:10:46, 12.42s/it] + +{'loss': 0.4856, 'learning_rate': 1.1250330662352926e-07, 'epoch': 0.95} + + 95%|█████████▌| 7036/7378 [24:07:09<1:10:46, 12.42s/it] + 95%|█████████▌| 7037/7378 [24:07:21<1:09:44, 12.27s/it] + +{'loss': 0.4503, 'learning_rate': 1.1184758181065902e-07, 'epoch': 0.95} + + 95%|█████████▌| 7037/7378 [24:07:21<1:09:44, 12.27s/it] + 95%|█████████▌| 7038/7378 [24:07:33<1:09:50, 12.33s/it] + +{'loss': 0.4289, 'learning_rate': 1.1119376278420301e-07, 'epoch': 0.95} + + 95%|█████████▌| 7038/7378 [24:07:33<1:09:50, 12.33s/it] + 95%|█████████▌| 7039/7378 [24:07:45<1:09:21, 12.28s/it] + +{'loss': 0.4165, 'learning_rate': 1.105418496701749e-07, 'epoch': 0.95} + + 95%|█████████▌| 7039/7378 [24:07:45<1:09:21, 12.28s/it] + 95%|█████████▌| 7040/7378 [24:07:58<1:09:10, 12.28s/it] + +{'loss': 0.3865, 'learning_rate': 1.0989184259422081e-07, 'epoch': 0.95} + + 95%|█████████▌| 7040/7378 [24:07:58<1:09:10, 12.28s/it] + 95%|█████████▌| 7041/7378 [24:08:10<1:09:16, 12.33s/it] + +{'loss': 0.4335, 'learning_rate': 1.0924374168161833e-07, 'epoch': 0.95} + + 95%|█████████▌| 7041/7378 [24:08:10<1:09:16, 12.33s/it] + 95%|█████████▌| 7042/7378 [24:08:22<1:08:42, 12.27s/it] + +{'loss': 0.3999, 'learning_rate': 1.0859754705728087e-07, 'epoch': 0.95} + + 95%|█████████▌| 7042/7378 [24:08:22<1:08:42, 12.27s/it] + 95%|█████████▌| 7043/7378 [24:08:34<1:07:59, 12.18s/it] + +{'loss': 0.4169, 'learning_rate': 1.0795325884575103e-07, 'epoch': 0.95} + + 95%|█████████▌| 7043/7378 [24:08:34<1:07:59, 12.18s/it] + 95%|█████████▌| 7044/7378 [24:08:47<1:08:12, 12.25s/it] + +{'loss': 0.4121, 'learning_rate': 1.0731087717120503e-07, 'epoch': 0.95} + + 95%|█████████▌| 7044/7378 [24:08:47<1:08:12, 12.25s/it] + 95%|█████████▌| 7045/7378 [24:08:59<1:07:51, 12.23s/it] + +{'loss': 0.4232, 'learning_rate': 1.0667040215745272e-07, 'epoch': 0.95} + + 95%|█████████▌| 7045/7378 [24:08:59<1:07:51, 12.23s/it] + 96%|█████████▌| 7046/7378 [24:09:11<1:07:45, 12.24s/it] + +{'loss': 0.4364, 'learning_rate': 1.0603183392793536e-07, 'epoch': 0.96} + + 96%|█████████▌| 7046/7378 [24:09:11<1:07:45, 12.24s/it] + 96%|█████████▌| 7047/7378 [24:09:23<1:07:51, 12.30s/it] + +{'loss': 0.4408, 'learning_rate': 1.0539517260572562e-07, 'epoch': 0.96} + + 96%|█████████▌| 7047/7378 [24:09:23<1:07:51, 12.30s/it] + 96%|█████████▌| 7048/7378 [24:09:35<1:07:08, 12.21s/it] + +{'loss': 0.4872, 'learning_rate': 1.0476041831353201e-07, 'epoch': 0.96} + + 96%|█████████▌| 7048/7378 [24:09:35<1:07:08, 12.21s/it] + 96%|█████████▌| 7049/7378 [24:09:48<1:06:59, 12.22s/it] + +{'loss': 0.4078, 'learning_rate': 1.0412757117369222e-07, 'epoch': 0.96} + + 96%|█████████▌| 7049/7378 [24:09:48<1:06:59, 12.22s/it] + 96%|█████████▌| 7050/7378 [24:10:00<1:06:57, 12.25s/it] + +{'loss': 0.4146, 'learning_rate': 1.0349663130817866e-07, 'epoch': 0.96} + + 96%|█████████▌| 7050/7378 [24:10:00<1:06:57, 12.25s/it] + 96%|█████████▌| 7051/7378 [24:10:12<1:06:52, 12.27s/it] + +{'loss': 0.4094, 'learning_rate': 1.0286759883859298e-07, 'epoch': 0.96} + + 96%|█████████▌| 7051/7378 [24:10:12<1:06:52, 12.27s/it] + 96%|█████████▌| 7052/7378 [24:10:25<1:06:39, 12.27s/it] + +{'loss': 0.5114, 'learning_rate': 1.0224047388617375e-07, 'epoch': 0.96} + + 96%|█████████▌| 7052/7378 [24:10:25<1:06:39, 12.27s/it] + 96%|█████████▌| 7053/7378 [24:10:37<1:06:16, 12.24s/it] + +{'loss': 0.4431, 'learning_rate': 1.0161525657178872e-07, 'epoch': 0.96} + + 96%|█████████▌| 7053/7378 [24:10:37<1:06:16, 12.24s/it] + 96%|█████████▌| 7054/7378 [24:10:49<1:06:03, 12.23s/it] + +{'loss': 0.4315, 'learning_rate': 1.0099194701593817e-07, 'epoch': 0.96} + + 96%|█████████▌| 7054/7378 [24:10:49<1:06:03, 12.23s/it] + 96%|█████████▌| 7055/7378 [24:11:01<1:05:45, 12.21s/it] + +{'loss': 0.4199, 'learning_rate': 1.0037054533875601e-07, 'epoch': 0.96} + + 96%|█████████▌| 7055/7378 [24:11:01<1:05:45, 12.21s/it] + 96%|█████████▌| 7056/7378 [24:11:13<1:05:45, 12.25s/it] + +{'loss': 0.3951, 'learning_rate': 9.975105166000642e-08, 'epoch': 0.96} + + 96%|█████████▌| 7056/7378 [24:11:13<1:05:45, 12.25s/it] + 96%|█████████▌| 7057/7378 [24:11:26<1:05:28, 12.24s/it] + +{'loss': 0.4476, 'learning_rate': 9.913346609908836e-08, 'epoch': 0.96} + + 96%|█████████▌| 7057/7378 [24:11:26<1:05:28, 12.24s/it] + 96%|█████████▌| 7058/7378 [24:11:39<1:06:12, 12.41s/it] + +{'loss': 0.474, 'learning_rate': 9.851778877503215e-08, 'epoch': 0.96} + + 96%|█████████▌| 7058/7378 [24:11:39<1:06:12, 12.41s/it] + 96%|█████████▌| 7059/7378 [24:11:51<1:05:31, 12.33s/it] + +{'loss': 0.4534, 'learning_rate': 9.790401980649844e-08, 'epoch': 0.96} + + 96%|█████████▌| 7059/7378 [24:11:51<1:05:31, 12.33s/it] + 96%|█████████▌| 7060/7378 [24:12:03<1:05:22, 12.34s/it] + +{'loss': 0.4337, 'learning_rate': 9.729215931178149e-08, 'epoch': 0.96} + + 96%|█████████▌| 7060/7378 [24:12:03<1:05:22, 12.34s/it] + 96%|█████████▌| 7061/7378 [24:12:15<1:04:58, 12.30s/it] + +{'loss': 0.4636, 'learning_rate': 9.668220740881029e-08, 'epoch': 0.96} + + 96%|█████████▌| 7061/7378 [24:12:15<1:04:58, 12.30s/it] + 96%|█████████▌| 7062/7378 [24:12:27<1:04:46, 12.30s/it] + +{'loss': 0.4047, 'learning_rate': 9.607416421514081e-08, 'epoch': 0.96} + + 96%|█████████▌| 7062/7378 [24:12:28<1:04:46, 12.30s/it] + 96%|█████████▌| 7063/7378 [24:12:40<1:05:03, 12.39s/it] + +{'loss': 0.4681, 'learning_rate': 9.546802984796489e-08, 'epoch': 0.96} + + 96%|█████████▌| 7063/7378 [24:12:40<1:05:03, 12.39s/it] + 96%|█████████▌| 7064/7378 [24:12:53<1:04:53, 12.40s/it] + +{'loss': 0.3417, 'learning_rate': 9.48638044241057e-08, 'epoch': 0.96} + + 96%|█████████▌| 7064/7378 [24:12:53<1:04:53, 12.40s/it] + 96%|█████████▌| 7065/7378 [24:13:05<1:05:12, 12.50s/it] + +{'loss': 0.426, 'learning_rate': 9.426148806001789e-08, 'epoch': 0.96} + + 96%|█████████▌| 7065/7378 [24:13:05<1:05:12, 12.50s/it] + 96%|█████████▌| 7066/7378 [24:13:17<1:04:35, 12.42s/it] + +{'loss': 0.4621, 'learning_rate': 9.36610808717886e-08, 'epoch': 0.96} + + 96%|█████████▌| 7066/7378 [24:13:17<1:04:35, 12.42s/it] + 96%|█████████▌| 7067/7378 [24:13:29<1:03:13, 12.20s/it] + +{'loss': 0.4193, 'learning_rate': 9.306258297513637e-08, 'epoch': 0.96} + + 96%|█████████▌| 7067/7378 [24:13:29<1:03:13, 12.20s/it] + 96%|█████████▌| 7068/7378 [24:13:41<1:02:48, 12.16s/it] + +{'loss': 0.4767, 'learning_rate': 9.246599448541337e-08, 'epoch': 0.96} + + 96%|█████████▌| 7068/7378 [24:13:41<1:02:48, 12.16s/it] + 96%|█████████▌| 7069/7378 [24:13:54<1:02:53, 12.21s/it] + +{'loss': 0.4439, 'learning_rate': 9.18713155176021e-08, 'epoch': 0.96} + + 96%|█████████▌| 7069/7378 [24:13:54<1:02:53, 12.21s/it] + 96%|█████████▌| 7070/7378 [24:14:06<1:02:59, 12.27s/it] + +{'loss': 0.4041, 'learning_rate': 9.127854618631637e-08, 'epoch': 0.96} + + 96%|█████████▌| 7070/7378 [24:14:06<1:02:59, 12.27s/it] + 96%|█████████▌| 7071/7378 [24:14:18<1:02:33, 12.23s/it] + +{'loss': 0.4233, 'learning_rate': 9.068768660580595e-08, 'epoch': 0.96} + + 96%|█████████▌| 7071/7378 [24:14:18<1:02:33, 12.23s/it] + 96%|█████████▌| 7072/7378 [24:14:30<1:02:13, 12.20s/it] + +{'loss': 0.4633, 'learning_rate': 9.009873688994753e-08, 'epoch': 0.96} + + 96%|█████████▌| 7072/7378 [24:14:30<1:02:13, 12.20s/it] + 96%|█████████▌| 7073/7378 [24:14:42<1:01:49, 12.16s/it] + +{'loss': 0.4349, 'learning_rate': 8.951169715225249e-08, 'epoch': 0.96} + + 96%|█████████▌| 7073/7378 [24:14:42<1:01:49, 12.16s/it] + 96%|█████████▌| 7074/7378 [24:14:55<1:01:41, 12.17s/it] + +{'loss': 0.4411, 'learning_rate': 8.89265675058637e-08, 'epoch': 0.96} + + 96%|█████████▌| 7074/7378 [24:14:55<1:01:41, 12.17s/it] + 96%|█████████▌| 7075/7378 [24:15:07<1:01:47, 12.24s/it] + +{'loss': 0.4362, 'learning_rate': 8.834334806355649e-08, 'epoch': 0.96} + + 96%|█████████▌| 7075/7378 [24:15:07<1:01:47, 12.24s/it] + 96%|█████████▌| 7076/7378 [24:15:19<1:02:02, 12.33s/it] + +{'loss': 0.4049, 'learning_rate': 8.776203893773539e-08, 'epoch': 0.96} + + 96%|█████████▌| 7076/7378 [24:15:19<1:02:02, 12.33s/it] + 96%|█████████▌| 7077/7378 [24:15:31<1:01:21, 12.23s/it] + +{'loss': 0.4573, 'learning_rate': 8.718264024044077e-08, 'epoch': 0.96} + + 96%|█████████▌| 7077/7378 [24:15:31<1:01:21, 12.23s/it] + 96%|█████████▌| 7078/7378 [24:15:44<1:01:35, 12.32s/it] + +{'loss': 0.4775, 'learning_rate': 8.660515208334108e-08, 'epoch': 0.96} + + 96%|█████████▌| 7078/7378 [24:15:44<1:01:35, 12.32s/it] + 96%|█████████▌| 7079/7378 [24:15:56<1:01:18, 12.30s/it] + +{'loss': 0.4264, 'learning_rate': 8.602957457773842e-08, 'epoch': 0.96} + + 96%|█████████▌| 7079/7378 [24:15:56<1:01:18, 12.30s/it] + 96%|█████████▌| 7080/7378 [24:16:09<1:01:13, 12.33s/it] + +{'loss': 0.4228, 'learning_rate': 8.545590783456625e-08, 'epoch': 0.96} + + 96%|█████████▌| 7080/7378 [24:16:09<1:01:13, 12.33s/it] + 96%|█████████▌| 7081/7378 [24:16:21<1:01:16, 12.38s/it] + +{'loss': 0.4127, 'learning_rate': 8.488415196439059e-08, 'epoch': 0.96} + + 96%|█████████▌| 7081/7378 [24:16:21<1:01:16, 12.38s/it] + 96%|█████████▌| 7082/7378 [24:16:33<1:00:58, 12.36s/it] + +{'loss': 0.4379, 'learning_rate': 8.431430707740773e-08, 'epoch': 0.96} + + 96%|█████████▌| 7082/7378 [24:16:33<1:00:58, 12.36s/it] + 96%|█████████▌| 7083/7378 [24:16:46<1:00:29, 12.30s/it] + +{'loss': 0.4378, 'learning_rate': 8.374637328344648e-08, 'epoch': 0.96} + + 96%|█████████▌| 7083/7378 [24:16:46<1:00:29, 12.30s/it] + 96%|█████████▌| 7084/7378 [24:16:58<59:57, 12.24s/it] + +{'loss': 0.3895, 'learning_rate': 8.318035069196817e-08, 'epoch': 0.96} + + 96%|█████████▌| 7084/7378 [24:16:58<59:57, 12.24s/it] + 96%|█████████▌| 7085/7378 [24:17:10<59:51, 12.26s/it] + +{'loss': 0.4533, 'learning_rate': 8.261623941206331e-08, 'epoch': 0.96} + + 96%|█████████▌| 7085/7378 [24:17:10<59:51, 12.26s/it] + 96%|█████████▌| 7086/7378 [24:17:22<59:41, 12.26s/it] + +{'loss': 0.448, 'learning_rate': 8.205403955245606e-08, 'epoch': 0.96} + + 96%|█████████▌| 7086/7378 [24:17:22<59:41, 12.26s/it] + 96%|█████████▌| 7087/7378 [24:17:35<59:53, 12.35s/it] + +{'loss': 0.4706, 'learning_rate': 8.149375122150193e-08, 'epoch': 0.96} + + 96%|█████████▌| 7087/7378 [24:17:35<59:53, 12.35s/it] + 96%|█████████▌| 7088/7378 [24:17:47<59:15, 12.26s/it] + +{'loss': 0.4569, 'learning_rate': 8.09353745271868e-08, 'epoch': 0.96} + + 96%|█████████▌| 7088/7378 [24:17:47<59:15, 12.26s/it] + 96%|█████████▌| 7089/7378 [24:17:59<59:23, 12.33s/it] + +{'loss': 0.4408, 'learning_rate': 8.037890957713013e-08, 'epoch': 0.96} + + 96%|█████████▌| 7089/7378 [24:17:59<59:23, 12.33s/it] + 96%|█████████▌| 7090/7378 [24:18:12<59:51, 12.47s/it] + +{'loss': 0.4898, 'learning_rate': 7.982435647858167e-08, 'epoch': 0.96} + + 96%|█████████▌| 7090/7378 [24:18:12<59:51, 12.47s/it] + 96%|█████████▌| 7091/7378 [24:18:25<1:00:03, 12.56s/it] + +{'loss': 0.4284, 'learning_rate': 7.92717153384226e-08, 'epoch': 0.96} + + 96%|█████████▌| 7091/7378 [24:18:25<1:00:03, 12.56s/it] + 96%|█████████▌| 7092/7378 [24:18:37<59:05, 12.40s/it] + +{'loss': 0.4289, 'learning_rate': 7.872098626316438e-08, 'epoch': 0.96} + + 96%|█████████▌| 7092/7378 [24:18:37<59:05, 12.40s/it] + 96%|█████████▌| 7093/7378 [24:18:49<58:05, 12.23s/it] + +{'loss': 0.4183, 'learning_rate': 7.817216935895434e-08, 'epoch': 0.96} + + 96%|█████████▌| 7093/7378 [24:18:49<58:05, 12.23s/it] + 96%|█████████▌| 7094/7378 [24:19:01<58:05, 12.27s/it] + +{'loss': 0.4097, 'learning_rate': 7.762526473156561e-08, 'epoch': 0.96} + + 96%|█████████▌| 7094/7378 [24:19:01<58:05, 12.27s/it] + 96%|█████████▌| 7095/7378 [24:19:13<57:53, 12.27s/it] + +{'loss': 0.3837, 'learning_rate': 7.708027248640726e-08, 'epoch': 0.96} + + 96%|█████████▌| 7095/7378 [24:19:13<57:53, 12.27s/it] + 96%|█████████▌| 7096/7378 [24:19:26<57:40, 12.27s/it] + +{'loss': 0.3789, 'learning_rate': 7.653719272851745e-08, 'epoch': 0.96} + + 96%|█████████▌| 7096/7378 [24:19:26<57:40, 12.27s/it] + 96%|█████████▌| 7097/7378 [24:19:38<57:16, 12.23s/it] + +{'loss': 0.4389, 'learning_rate': 7.59960255625658e-08, 'epoch': 0.96} + + 96%|█████████▌| 7097/7378 [24:19:38<57:16, 12.23s/it] + 96%|█████████▌| 7098/7378 [24:19:50<57:18, 12.28s/it] + +{'loss': 0.4152, 'learning_rate': 7.545677109285443e-08, 'epoch': 0.96} + + 96%|█████████▌| 7098/7378 [24:19:50<57:18, 12.28s/it] + 96%|█████████▌| 7099/7378 [24:20:03<57:07, 12.28s/it] + +{'loss': 0.4114, 'learning_rate': 7.491942942331687e-08, 'epoch': 0.96} + + 96%|█████████▌| 7099/7378 [24:20:03<57:07, 12.28s/it] + 96%|█████████▌| 7100/7378 [24:20:14<56:22, 12.17s/it] + +{'loss': 0.4293, 'learning_rate': 7.438400065751584e-08, 'epoch': 0.96} + + 96%|█████████▌| 7100/7378 [24:20:14<56:22, 12.17s/it] + 96%|█████████▌| 7101/7378 [24:20:27<56:27, 12.23s/it] + +{'loss': 0.4114, 'learning_rate': 7.385048489864765e-08, 'epoch': 0.96} + + 96%|█████████▌| 7101/7378 [24:20:27<56:27, 12.23s/it] + 96%|█████████▋| 7102/7378 [24:20:39<55:59, 12.17s/it] + +{'loss': 0.4585, 'learning_rate': 7.331888224953787e-08, 'epoch': 0.96} + + 96%|█████████▋| 7102/7378 [24:20:39<55:59, 12.17s/it] + 96%|█████████▋| 7103/7378 [24:20:51<56:00, 12.22s/it] + +{'loss': 0.4212, 'learning_rate': 7.278919281264673e-08, 'epoch': 0.96} + + 96%|█████████▋| 7103/7378 [24:20:51<56:00, 12.22s/it] + 96%|█��███████▋| 7104/7378 [24:21:03<55:44, 12.21s/it] + +{'loss': 0.3998, 'learning_rate': 7.226141669006259e-08, 'epoch': 0.96} + + 96%|█████████▋| 7104/7378 [24:21:03<55:44, 12.21s/it] + 96%|█████████▋| 7105/7378 [24:21:15<55:10, 12.13s/it] + +{'loss': 0.4361, 'learning_rate': 7.173555398350518e-08, 'epoch': 0.96} + + 96%|█████████▋| 7105/7378 [24:21:15<55:10, 12.13s/it] + 96%|█████████▋| 7106/7378 [24:21:27<54:57, 12.12s/it] + +{'loss': 0.4215, 'learning_rate': 7.121160479432787e-08, 'epoch': 0.96} + + 96%|█████████▋| 7106/7378 [24:21:27<54:57, 12.12s/it] + 96%|█████████▋| 7107/7378 [24:21:40<55:04, 12.20s/it] + +{'loss': 0.4033, 'learning_rate': 7.068956922351211e-08, 'epoch': 0.96} + + 96%|█████████▋| 7107/7378 [24:21:40<55:04, 12.20s/it] + 96%|█████████▋| 7108/7378 [24:21:52<55:13, 12.27s/it] + +{'loss': 0.409, 'learning_rate': 7.016944737167297e-08, 'epoch': 0.96} + + 96%|█████████▋| 7108/7378 [24:21:52<55:13, 12.27s/it] + 96%|█████████▋| 7109/7378 [24:22:05<55:14, 12.32s/it] + +{'loss': 0.4159, 'learning_rate': 6.965123933905583e-08, 'epoch': 0.96} + + 96%|█████████▋| 7109/7378 [24:22:05<55:14, 12.32s/it] + 96%|█████████▋| 7110/7378 [24:22:17<54:31, 12.21s/it] + +{'loss': 0.427, 'learning_rate': 6.913494522553632e-08, 'epoch': 0.96} + + 96%|█████████▋| 7110/7378 [24:22:17<54:31, 12.21s/it] + 96%|█████████▋| 7111/7378 [24:22:29<54:37, 12.27s/it] + +{'loss': 0.4796, 'learning_rate': 6.862056513062266e-08, 'epoch': 0.96} + + 96%|█████████▋| 7111/7378 [24:22:29<54:37, 12.27s/it] + 96%|█████████▋| 7112/7378 [24:22:42<54:47, 12.36s/it] + +{'loss': 0.4296, 'learning_rate': 6.810809915345328e-08, 'epoch': 0.96} + + 96%|█████████▋| 7112/7378 [24:22:42<54:47, 12.36s/it] + 96%|█████████▋| 7113/7378 [24:22:54<54:09, 12.26s/it] + +{'loss': 0.4383, 'learning_rate': 6.759754739279923e-08, 'epoch': 0.96} + + 96%|█████████▋| 7113/7378 [24:22:54<54:09, 12.26s/it] + 96%|█████████▋| 7114/7378 [24:23:05<53:16, 12.11s/it] + +{'loss': 0.4346, 'learning_rate': 6.708890994705952e-08, 'epoch': 0.96} + + 96%|█████████▋| 7114/7378 [24:23:05<53:16, 12.11s/it] + 96%|█████████▋| 7115/7378 [24:23:18<53:30, 12.21s/it] + +{'loss': 0.4671, 'learning_rate': 6.6582186914268e-08, 'epoch': 0.96} + + 96%|█████████▋| 7115/7378 [24:23:18<53:30, 12.21s/it] + 96%|█████████▋| 7116/7378 [24:23:30<53:56, 12.35s/it] + +{'loss': 0.4604, 'learning_rate': 6.607737839208428e-08, 'epoch': 0.96} + + 96%|█████████▋| 7116/7378 [24:23:30<53:56, 12.35s/it] + 96%|█████████▋| 7117/7378 [24:23:43<53:30, 12.30s/it] + +{'loss': 0.4898, 'learning_rate': 6.557448447780612e-08, 'epoch': 0.96} + + 96%|█████████▋| 7117/7378 [24:23:43<53:30, 12.30s/it] + 96%|█████████▋| 7118/7378 [24:23:55<52:59, 12.23s/it] + +{'loss': 0.3424, 'learning_rate': 6.507350526835709e-08, 'epoch': 0.96} + + 96%|█████████▋| 7118/7378 [24:23:55<52:59, 12.23s/it] + 96%|█████████▋| 7119/7378 [24:24:07<53:17, 12.35s/it] + +{'loss': 0.4314, 'learning_rate': 6.457444086029219e-08, 'epoch': 0.96} + + 96%|█████████▋| 7119/7378 [24:24:07<53:17, 12.35s/it] + 97%|█████████▋| 7120/7378 [24:24:20<53:01, 12.33s/it] + +{'loss': 0.4307, 'learning_rate': 6.40772913497989e-08, 'epoch': 0.97} + + 97%|█████████▋| 7120/7378 [24:24:20<53:01, 12.33s/it] + 97%|█████████▋| 7121/7378 [24:24:32<52:49, 12.33s/it] + +{'loss': 0.4322, 'learning_rate': 6.358205683269392e-08, 'epoch': 0.97} + + 97%|█████████▋| 7121/7378 [24:24:32<52:49, 12.33s/it] + 97%|█████████▋| 7122/7378 [24:24:44<52:10, 12.23s/it] + +{'loss': 0.4402, 'learning_rate': 6.308873740442867e-08, 'epoch': 0.97} + + 97%|█████████▋| 7122/7378 [24:24:44<52:10, 12.23s/it] + 97%|█████████▋| 7123/7378 [24:24:56<52:19, 12.31s/it] + +{'loss': 0.372, 'learning_rate': 6.259733316007932e-08, 'epoch': 0.97} + + 97%|█████████▋| 7123/7378 [24:24:56<52:19, 12.31s/it] + 97%|█████████▋| 7124/7378 [24:25:09<51:48, 12.24s/it] + +{'loss': 0.423, 'learning_rate': 6.21078441943601e-08, 'epoch': 0.97} + + 97%|█████████▋| 7124/7378 [24:25:09<51:48, 12.24s/it] + 97%|█████████▋| 7125/7378 [24:25:22<52:32, 12.46s/it] + +{'loss': 0.423, 'learning_rate': 6.162027060160891e-08, 'epoch': 0.97} + + 97%|█████████▋| 7125/7378 [24:25:22<52:32, 12.46s/it] + 97%|█████████▋| 7126/7378 [24:25:34<52:18, 12.46s/it] + +{'loss': 0.4012, 'learning_rate': 6.113461247579944e-08, 'epoch': 0.97} + + 97%|█████████▋| 7126/7378 [24:25:34<52:18, 12.46s/it] + 97%|█████████▋| 7127/7378 [24:25:46<51:20, 12.27s/it] + +{'loss': 0.4575, 'learning_rate': 6.065086991053459e-08, 'epoch': 0.97} + + 97%|█████████▋| 7127/7378 [24:25:46<51:20, 12.27s/it] + 97%|█████████▋| 7128/7378 [24:25:58<51:12, 12.29s/it] + +{'loss': 0.3438, 'learning_rate': 6.016904299904869e-08, 'epoch': 0.97} + + 97%|█████████▋| 7128/7378 [24:25:58<51:12, 12.29s/it] + 97%|█████████▋| 7129/7378 [24:26:10<50:34, 12.19s/it] + +{'loss': 0.4782, 'learning_rate': 5.968913183420521e-08, 'epoch': 0.97} + + 97%|█████████▋| 7129/7378 [24:26:10<50:34, 12.19s/it] + 97%|█████████▋| 7130/7378 [24:26:22<50:20, 12.18s/it] + +{'loss': 0.4098, 'learning_rate': 5.921113650849908e-08, 'epoch': 0.97} + + 97%|█████████▋| 7130/7378 [24:26:22<50:20, 12.18s/it] + 97%|█████████▋| 7131/7378 [24:26:35<50:14, 12.21s/it] + +{'loss': 0.4049, 'learning_rate': 5.87350571140588e-08, 'epoch': 0.97} + + 97%|█████████▋| 7131/7378 [24:26:35<50:14, 12.21s/it] + 97%|█████████▋| 7132/7378 [24:26:47<49:52, 12.16s/it] + +{'loss': 0.4293, 'learning_rate': 5.826089374263988e-08, 'epoch': 0.97} + + 97%|█████████▋| 7132/7378 [24:26:47<49:52, 12.16s/it] + 97%|█████████▋| 7133/7378 [24:26:59<49:53, 12.22s/it] + +{'loss': 0.4204, 'learning_rate': 5.778864648562921e-08, 'epoch': 0.97} + + 97%|█████████▋| 7133/7378 [24:26:59<49:53, 12.22s/it] + 97%|█████████▋| 7134/7378 [24:27:11<49:55, 12.28s/it] + +{'loss': 0.444, 'learning_rate': 5.731831543404509e-08, 'epoch': 0.97} + + 97%|█████████▋| 7134/7378 [24:27:11<49:55, 12.28s/it] + 97%|█████████▋| 7135/7378 [24:27:23<49:32, 12.23s/it] + +{'loss': 0.377, 'learning_rate': 5.684990067853835e-08, 'epoch': 0.97} + + 97%|█████████▋| 7135/7378 [24:27:23<49:32, 12.23s/it] + 97%|█████████▋| 7136/7378 [24:27:36<49:08, 12.18s/it] + +{'loss': 0.4712, 'learning_rate': 5.638340230938677e-08, 'epoch': 0.97} + + 97%|█████████▋| 7136/7378 [24:27:36<49:08, 12.18s/it] + 97%|█████████▋| 7137/7378 [24:27:48<49:23, 12.30s/it] + +{'loss': 0.4499, 'learning_rate': 5.5918820416500653e-08, 'epoch': 0.97} + + 97%|█████████▋| 7137/7378 [24:27:48<49:23, 12.30s/it] + 97%|█████████▋| 7138/7378 [24:28:00<49:00, 12.25s/it] + +{'loss': 0.4537, 'learning_rate': 5.54561550894217e-08, 'epoch': 0.97} + + 97%|█████████▋| 7138/7378 [24:28:00<49:00, 12.25s/it] + 97%|█████████▋| 7139/7378 [24:28:15<52:13, 13.11s/it] + +{'loss': 0.4649, 'learning_rate': 5.499540641731971e-08, 'epoch': 0.97} + + 97%|█████████▋| 7139/7378 [24:28:15<52:13, 13.11s/it] + 97%|█████████▋| 7140/7378 [24:28:27<50:40, 12.78s/it] + +{'loss': 0.4579, 'learning_rate': 5.4536574488999185e-08, 'epoch': 0.97} + + 97%|█████████▋| 7140/7378 [24:28:27<50:40, 12.78s/it] + 97%|█████████▋| 7141/7378 [24:28:40<50:07, 12.69s/it] + +{'loss': 0.3871, 'learning_rate': 5.407965939289161e-08, 'epoch': 0.97} + + 97%|█████████▋| 7141/7378 [24:28:40<50:07, 12.69s/it] + 97%|█████████▋| 7142/7378 [24:28:52<49:14, 12.52s/it] + +{'loss': 0.3941, 'learning_rate': 5.3624661217059895e-08, 'epoch': 0.97} + + 97%|█████████▋| 7142/7378 [24:28:52<49:14, 12.52s/it] + 97%|█████████▋| 7143/7378 [24:29:05<49:08, 12.55s/it] + +{'loss': 0.4408, 'learning_rate': 5.3171580049199425e-08, 'epoch': 0.97} + + 97%|█████████▋| 7143/7378 [24:29:05<49:08, 12.55s/it] + 97%|█████████▋| 7144/7378 [24:29:16<48:05, 12.33s/it] + +{'loss': 0.4652, 'learning_rate': 5.2720415976631465e-08, 'epoch': 0.97} + + 97%|█████████▋| 7144/7378 [24:29:16<48:05, 12.33s/it] + 97%|█████████▋| 7145/7378 [24:29:29<47:49, 12.31s/it] + +{'loss': 0.4623, 'learning_rate': 5.227116908631314e-08, 'epoch': 0.97} + + 97%|█████████▋| 7145/7378 [24:29:29<47:49, 12.31s/it] + 97%|█████████▋| 7146/7378 [24:29:41<47:34, 12.31s/it] + +{'loss': 0.3899, 'learning_rate': 5.1823839464829605e-08, 'epoch': 0.97} + + 97%|█████████▋| 7146/7378 [24:29:41<47:34, 12.31s/it] + 97%|█████████▋| 7147/7378 [24:29:54<47:45, 12.41s/it] + +{'loss': 0.3525, 'learning_rate': 5.1378427198396364e-08, 'epoch': 0.97} + + 97%|█████████▋| 7147/7378 [24:29:54<47:45, 12.41s/it] + 97%|█████████▋| 7148/7378 [24:30:06<47:17, 12.34s/it] + +{'loss': 0.4047, 'learning_rate': 5.093493237285918e-08, 'epoch': 0.97} + + 97%|█████████▋| 7148/7378 [24:30:06<47:17, 12.34s/it] + 97%|█████████▋| 7149/7378 [24:30:18<47:13, 12.37s/it] + +{'loss': 0.4713, 'learning_rate': 5.049335507369524e-08, 'epoch': 0.97} + + 97%|█████████▋| 7149/7378 [24:30:18<47:13, 12.37s/it] + 97%|█████████▋| 7150/7378 [24:30:30<46:50, 12.33s/it] + +{'loss': 0.3735, 'learning_rate': 5.005369538601201e-08, 'epoch': 0.97} + + 97%|█████████▋| 7150/7378 [24:30:30<46:50, 12.33s/it] + 97%|█████████▋| 7151/7378 [24:30:43<46:23, 12.26s/it] + +{'loss': 0.4374, 'learning_rate': 4.9615953394545056e-08, 'epoch': 0.97} + + 97%|█████████▋| 7151/7378 [24:30:43<46:23, 12.26s/it] + 97%|█████████▋| 7152/7378 [24:30:55<46:00, 12.22s/it] + +{'loss': 0.4455, 'learning_rate': 4.918012918366466e-08, 'epoch': 0.97} + + 97%|█████████▋| 7152/7378 [24:30:55<46:00, 12.22s/it] + 97%|█████████▋| 7153/7378 [24:31:07<45:29, 12.13s/it] + +{'loss': 0.3924, 'learning_rate': 4.874622283736807e-08, 'epoch': 0.97} + + 97%|█████████▋| 7153/7378 [24:31:07<45:29, 12.13s/it] + 97%|█████████▋| 7154/7378 [24:31:19<45:12, 12.11s/it] + +{'loss': 0.439, 'learning_rate': 4.831423443928396e-08, 'epoch': 0.97} + + 97%|█████████▋| 7154/7378 [24:31:19<45:12, 12.11s/it] + 97%|█████████▋| 7155/7378 [24:31:31<45:10, 12.15s/it] + +{'loss': 0.4518, 'learning_rate': 4.788416407267127e-08, 'epoch': 0.97} + + 97%|█████████▋| 7155/7378 [24:31:31<45:10, 12.15s/it] + 97%|█████████▋| 7156/7378 [24:31:43<44:58, 12.15s/it] + +{'loss': 0.4697, 'learning_rate': 4.745601182042037e-08, 'epoch': 0.97} + + 97%|█████████▋| 7156/7378 [24:31:43<44:58, 12.15s/it] + 97%|█████████▋| 7157/7378 [24:31:56<45:14, 12.28s/it] + +{'loss': 0.4295, 'learning_rate': 4.702977776504858e-08, 'epoch': 0.97} + + 97%|█████████▋| 7157/7378 [24:31:56<45:14, 12.28s/it] + 97%|█████████▋| 7158/7378 [24:32:08<45:04, 12.29s/it] + +{'loss': 0.4171, 'learning_rate': 4.6605461988707967e-08, 'epoch': 0.97} + + 97%|█████████▋| 7158/7378 [24:32:08<45:04, 12.29s/it] + 97%|█████████▋| 7159/7378 [24:32:20<44:26, 12.17s/it] + +{'loss': 0.4667, 'learning_rate': 4.618306457317756e-08, 'epoch': 0.97} + + 97%|█████████▋| 7159/7378 [24:32:20<44:26, 12.17s/it] + 97%|█████████▋| 7160/7378 [24:32:32<44:20, 12.20s/it] + +{'loss': 0.4779, 'learning_rate': 4.57625855998689e-08, 'epoch': 0.97} + + 97%|█████████▋| 7160/7378 [24:32:32<44:20, 12.20s/it] + 97%|█████████▋| 7161/7378 [24:32:44<44:06, 12.19s/it] + +{'loss': 0.4693, 'learning_rate': 4.5344025149821616e-08, 'epoch': 0.97} + + 97%|█████████▋| 7161/7378 [24:32:44<44:06, 12.19s/it] + 97%|█████████▋| 7162/7378 [24:32:57<44:14, 12.29s/it] + +{'loss': 0.4437, 'learning_rate': 4.4927383303706716e-08, 'epoch': 0.97} + + 97%|█████████▋| 7162/7378 [24:32:57<44:14, 12.29s/it] + 97%|█████████▋| 7163/7378 [24:33:09<44:03, 12.29s/it] + +{'loss': 0.4112, 'learning_rate': 4.451266014182665e-08, 'epoch': 0.97} + + 97%|█████████▋| 7163/7378 [24:33:09<44:03, 12.29s/it] + 97%|█████████▋| 7164/7378 [24:33:21<43:28, 12.19s/it] + +{'loss': 0.3564, 'learning_rate': 4.4099855744110796e-08, 'epoch': 0.97} + + 97%|█████████▋| 7164/7378 [24:33:21<43:28, 12.19s/it] + 97%|█████████▋| 7165/7378 [24:33:33<42:58, 12.11s/it] + +{'loss': 0.4297, 'learning_rate': 4.36889701901233e-08, 'epoch': 0.97} + + 97%|█████████▋| 7165/7378 [24:33:33<42:58, 12.11s/it] + 97%|█████████▋| 7166/7378 [24:33:45<42:43, 12.09s/it] + +{'loss': 0.4183, 'learning_rate': 4.328000355905415e-08, 'epoch': 0.97} + + 97%|█████████▋| 7166/7378 [24:33:45<42:43, 12.09s/it] + 97%|█████████▋| 7167/7378 [24:33:57<42:32, 12.10s/it] + +{'loss': 0.4041, 'learning_rate': 4.2872955929724736e-08, 'epoch': 0.97} + + 97%|█████████▋| 7167/7378 [24:33:57<42:32, 12.10s/it] + 97%|█████████▋| 7168/7378 [24:34:09<42:23, 12.11s/it] + +{'loss': 0.4294, 'learning_rate': 4.2467827380588964e-08, 'epoch': 0.97} + + 97%|█████████▋| 7168/7378 [24:34:09<42:23, 12.11s/it] + 97%|█████████▋| 7169/7378 [24:34:22<42:34, 12.22s/it] + +{'loss': 0.3913, 'learning_rate': 4.206461798972772e-08, 'epoch': 0.97} + + 97%|█████████▋| 7169/7378 [24:34:22<42:34, 12.22s/it] + 97%|█████████▋| 7170/7378 [24:34:35<42:56, 12.38s/it] + +{'loss': 0.3714, 'learning_rate': 4.166332783485438e-08, 'epoch': 0.97} + + 97%|█████████▋| 7170/7378 [24:34:35<42:56, 12.38s/it] + 97%|█████████▋| 7171/7378 [24:34:47<42:53, 12.43s/it] + +{'loss': 0.4539, 'learning_rate': 4.126395699330932e-08, 'epoch': 0.97} + + 97%|█████████▋| 7171/7378 [24:34:47<42:53, 12.43s/it] + 97%|█████████▋| 7172/7378 [24:34:59<42:12, 12.29s/it] + +{'loss': 0.4015, 'learning_rate': 4.0866505542066506e-08, 'epoch': 0.97} + + 97%|█████████▋| 7172/7378 [24:34:59<42:12, 12.29s/it] + 97%|█████████▋| 7173/7378 [24:35:12<42:18, 12.38s/it] + +{'loss': 0.4469, 'learning_rate': 4.047097355772911e-08, 'epoch': 0.97} + + 97%|█████████▋| 7173/7378 [24:35:12<42:18, 12.38s/it] + 97%|█████████▋| 7174/7378 [24:35:24<42:10, 12.41s/it] + +{'loss': 0.4345, 'learning_rate': 4.007736111652838e-08, 'epoch': 0.97} + + 97%|█████████▋| 7174/7378 [24:35:24<42:10, 12.41s/it] + 97%|█████████▋| 7175/7378 [24:35:36<41:31, 12.27s/it] + +{'loss': 0.4707, 'learning_rate': 3.968566829432807e-08, 'epoch': 0.97} + + 97%|█████████▋| 7175/7378 [24:35:36<41:31, 12.27s/it] + 97%|█████████▋| 7176/7378 [24:35:49<41:28, 12.32s/it] + +{'loss': 0.4542, 'learning_rate': 3.929589516661891e-08, 'epoch': 0.97} + + 97%|█████████▋| 7176/7378 [24:35:49<41:28, 12.32s/it] + 97%|█████████▋| 7177/7378 [24:36:01<41:21, 12.34s/it] + +{'loss': 0.4318, 'learning_rate': 3.890804180852525e-08, 'epoch': 0.97} + + 97%|█████████▋| 7177/7378 [24:36:01<41:21, 12.34s/it] + 97%|█████████▋| 7178/7378 [24:36:13<41:07, 12.34s/it] + +{'loss': 0.4179, 'learning_rate': 3.852210829479952e-08, 'epoch': 0.97} + + 97%|█████████▋| 7178/7378 [24:36:13<41:07, 12.34s/it] + 97%|█████████▋| 7179/7378 [24:36:26<40:54, 12.33s/it] + +{'loss': 0.4426, 'learning_rate': 3.8138094699824435e-08, 'epoch': 0.97} + + 97%|█████████▋| 7179/7378 [24:36:26<40:54, 12.33s/it] + 97%|█████████▋| 7180/7378 [24:36:38<40:37, 12.31s/it] + +{'loss': 0.4364, 'learning_rate': 3.7756001097611906e-08, 'epoch': 0.97} + + 97%|█████████▋| 7180/7378 [24:36:38<40:37, 12.31s/it] + 97%|█████████▋| 7181/7378 [24:36:51<41:03, 12.51s/it] + +{'loss': 0.4399, 'learning_rate': 3.737582756180525e-08, 'epoch': 0.97} + + 97%|█████████▋| 7181/7378 [24:36:51<41:03, 12.51s/it] + 97%|█████████▋| 7182/7378 [24:37:03<40:45, 12.48s/it] + +{'loss': 0.444, 'learning_rate': 3.699757416567584e-08, 'epoch': 0.97} + + 97%|█████████▋| 7182/7378 [24:37:03<40:45, 12.48s/it] + 97%|█████████▋| 7183/7378 [24:37:15<40:06, 12.34s/it] + +{'loss': 0.4117, 'learning_rate': 3.6621240982127606e-08, 'epoch': 0.97} + + 97%|█████████▋| 7183/7378 [24:37:15<40:06, 12.34s/it] + 97%|█████████▋| 7184/7378 [24:37:28<39:56, 12.35s/it] + +{'loss': 0.3886, 'learning_rate': 3.624682808369251e-08, 'epoch': 0.97} + + 97%|█████████▋| 7184/7378 [24:37:28<39:56, 12.35s/it] + 97%|█████████▋| 7185/7378 [24:37:40<39:28, 12.27s/it] + +{'loss': 0.4208, 'learning_rate': 3.587433554253172e-08, 'epoch': 0.97} + + 97%|█████████▋| 7185/7378 [24:37:40<39:28, 12.27s/it] + 97%|█████████▋| 7186/7378 [24:37:52<39:26, 12.33s/it] + +{'loss': 0.4392, 'learning_rate': 3.5503763430437823e-08, 'epoch': 0.97} + + 97%|█████████▋| 7186/7378 [24:37:52<39:26, 12.33s/it] + 97%|█████████▋| 7187/7378 [24:38:04<39:16, 12.34s/it] + +{'loss': 0.4633, 'learning_rate': 3.513511181883367e-08, 'epoch': 0.97} + + 97%|█████████▋| 7187/7378 [24:38:05<39:16, 12.34s/it] + 97%|█████████▋| 7188/7378 [24:38:16<38:41, 12.22s/it] + +{'loss': 0.4293, 'learning_rate': 3.4768380778770204e-08, 'epoch': 0.97} + + 97%|█████████▋| 7188/7378 [24:38:16<38:41, 12.22s/it] + 97%|█████████▋| 7189/7378 [24:38:29<38:20, 12.17s/it] + +{'loss': 0.4653, 'learning_rate': 3.4403570380929785e-08, 'epoch': 0.97} + + 97%|█████████▋| 7189/7378 [24:38:29<38:20, 12.17s/it] + 97%|█████████▋| 7190/7378 [24:38:41<38:15, 12.21s/it] + +{'loss': 0.3996, 'learning_rate': 3.404068069562283e-08, 'epoch': 0.97} + + 97%|█████████▋| 7190/7378 [24:38:41<38:15, 12.21s/it] + 97%|█████████▋| 7191/7378 [24:38:54<38:33, 12.37s/it] + +{'loss': 0.4479, 'learning_rate': 3.367971179279006e-08, 'epoch': 0.97} + + 97%|█████████▋| 7191/7378 [24:38:54<38:33, 12.37s/it] + 97%|█████████▋| 7192/7378 [24:39:06<38:19, 12.36s/it] + +{'loss': 0.4483, 'learning_rate': 3.332066374200582e-08, 'epoch': 0.97} + + 97%|█████████▋| 7192/7378 [24:39:06<38:19, 12.36s/it] + 97%|█████████▋| 7193/7378 [24:39:18<37:40, 12.22s/it] + +{'loss': 0.3861, 'learning_rate': 3.2963536612466986e-08, 'epoch': 0.97} + + 97%|█████████▋| 7193/7378 [24:39:18<37:40, 12.22s/it] + 98%|█████████▊| 7194/7378 [24:39:30<37:40, 12.28s/it] + +{'loss': 0.3785, 'learning_rate': 3.2608330473007374e-08, 'epoch': 0.98} + + 98%|█████████▊| 7194/7378 [24:39:30<37:40, 12.28s/it] + 98%|█████████▊| 7195/7378 [24:39:43<37:29, 12.29s/it] + +{'loss': 0.4839, 'learning_rate': 3.2255045392085574e-08, 'epoch': 0.98} + + 98%|█████████▊| 7195/7378 [24:39:43<37:29, 12.29s/it] + 98%|█████████▊| 7196/7378 [24:39:55<37:27, 12.35s/it] + +{'loss': 0.4439, 'learning_rate': 3.190368143779266e-08, 'epoch': 0.98} + + 98%|█████████▊| 7196/7378 [24:39:55<37:27, 12.35s/it] + 98%|█████████▊| 7197/7378 [24:40:07<37:09, 12.32s/it] + +{'loss': 0.4526, 'learning_rate': 3.155423867784779e-08, 'epoch': 0.98} + + 98%|█████████▊| 7197/7378 [24:40:07<37:09, 12.32s/it] + 98%|█████████▊| 7198/7378 [24:40:21<38:05, 12.70s/it] + +{'loss': 0.435, 'learning_rate': 3.120671717960155e-08, 'epoch': 0.98} + + 98%|█████████▊| 7198/7378 [24:40:21<38:05, 12.70s/it] + 98%|█████████▊| 7199/7378 [24:40:33<37:19, 12.51s/it] + +{'loss': 0.5007, 'learning_rate': 3.0861117010032584e-08, 'epoch': 0.98} + + 98%|█████████▊| 7199/7378 [24:40:33<37:19, 12.51s/it] + 98%|█████████▊| 7200/7378 [24:40:45<36:46, 12.40s/it] + +{'loss': 0.3624, 'learning_rate': 3.051743823574982e-08, 'epoch': 0.98} + + 98%|█████████▊| 7200/7378 [24:40:45<36:46, 12.40s/it] + 98%|█████████▊| 7201/7378 [24:40:57<36:13, 12.28s/it] + +{'loss': 0.4343, 'learning_rate': 3.017568092299139e-08, 'epoch': 0.98} + + 98%|█████████▊| 7201/7378 [24:40:57<36:13, 12.28s/it] + 98%|█████████▊| 7202/7378 [24:41:10<36:20, 12.39s/it] + +{'loss': 0.4094, 'learning_rate': 2.983584513762794e-08, 'epoch': 0.98} + + 98%|█████████▊| 7202/7378 [24:41:10<36:20, 12.39s/it] + 98%|█████████▊| 7203/7378 [24:41:22<35:58, 12.34s/it] + +{'loss': 0.4142, 'learning_rate': 2.949793094515485e-08, 'epoch': 0.98} + + 98%|█████████▊| 7203/7378 [24:41:22<35:58, 12.34s/it] + 98%|█████████▊| 7204/7378 [24:41:34<35:29, 12.24s/it] + +{'loss': 0.399, 'learning_rate': 2.916193841070114e-08, 'epoch': 0.98} + + 98%|█████████▊| 7204/7378 [24:41:34<35:29, 12.24s/it] + 98%|█████████▊| 7205/7378 [24:41:46<35:07, 12.18s/it] + +{'loss': 0.4315, 'learning_rate': 2.8827867599023896e-08, 'epoch': 0.98} + + 98%|█████████▊| 7205/7378 [24:41:46<35:07, 12.18s/it] + 98%|█████████▊| 7206/7378 [24:41:58<35:01, 12.22s/it] + +{'loss': 0.4437, 'learning_rate': 2.849571857450939e-08, 'epoch': 0.98} + + 98%|█████████▊| 7206/7378 [24:41:58<35:01, 12.22s/it] + 98%|█████████▊| 7207/7378 [24:42:10<34:40, 12.17s/it] + +{'loss': 0.4069, 'learning_rate': 2.8165491401176415e-08, 'epoch': 0.98} + + 98%|█████████▊| 7207/7378 [24:42:10<34:40, 12.17s/it] + 98%|█████████▊| 7208/7378 [24:42:22<34:19, 12.11s/it] + +{'loss': 0.3848, 'learning_rate': 2.7837186142668505e-08, 'epoch': 0.98} + + 98%|█████████▊| 7208/7378 [24:42:22<34:19, 12.11s/it] + 98%|█████████▊| 7209/7378 [24:42:34<34:04, 12.10s/it] + +{'loss': 0.4344, 'learning_rate': 2.751080286226171e-08, 'epoch': 0.98} + + 98%|█████████▊| 7209/7378 [24:42:34<34:04, 12.10s/it] + 98%|█████████▊| 7210/7378 [24:42:47<33:58, 12.13s/it] + +{'loss': 0.3799, 'learning_rate': 2.7186341622862378e-08, 'epoch': 0.98} + + 98%|█████████▊| 7210/7378 [24:42:47<33:58, 12.13s/it] + 98%|█████████▊| 7211/7378 [24:42:59<33:46, 12.13s/it] + +{'loss': 0.3985, 'learning_rate': 2.686380248700493e-08, 'epoch': 0.98} + + 98%|█████████▊| 7211/7378 [24:42:59<33:46, 12.13s/it] + 98%|█████████▊| 7212/7378 [24:43:11<33:41, 12.18s/it] + +{'loss': 0.4874, 'learning_rate': 2.6543185516852977e-08, 'epoch': 0.98} + + 98%|█████████▊| 7212/7378 [24:43:11<33:41, 12.18s/it] + 98%|█████████▊| 7213/7378 [24:43:23<33:37, 12.23s/it] + +{'loss': 0.4012, 'learning_rate': 2.622449077420153e-08, 'epoch': 0.98} + + 98%|█████████▊| 7213/7378 [24:43:23<33:37, 12.23s/it] + 98%|█████████▊| 7214/7378 [24:43:36<33:39, 12.32s/it] + +{'loss': 0.4263, 'learning_rate': 2.5907718320473674e-08, 'epoch': 0.98} + + 98%|█████████▊| 7214/7378 [24:43:36<33:39, 12.32s/it] + 98%|█████████▊| 7215/7378 [24:43:48<33:26, 12.31s/it] + +{'loss': 0.444, 'learning_rate': 2.5592868216721688e-08, 'epoch': 0.98} + + 98%|█████████▊| 7215/7378 [24:43:48<33:26, 12.31s/it] + 98%|█████████▊| 7216/7378 [24:44:00<33:14, 12.31s/it] + +{'loss': 0.4536, 'learning_rate': 2.5279940523629253e-08, 'epoch': 0.98} + + 98%|█████████▊| 7216/7378 [24:44:00<33:14, 12.31s/it] + 98%|█████████▊| 7217/7378 [24:44:13<33:04, 12.32s/it] + +{'loss': 0.474, 'learning_rate': 2.4968935301507015e-08, 'epoch': 0.98} + + 98%|█████████▊| 7217/7378 [24:44:13<33:04, 12.32s/it] + 98%|█████████▊| 7218/7378 [24:44:25<32:52, 12.33s/it] + +{'loss': 0.4858, 'learning_rate': 2.465985261029591e-08, 'epoch': 0.98} + + 98%|█████████▊| 7218/7378 [24:44:25<32:52, 12.33s/it] + 98%|█████████▊| 7219/7378 [24:44:38<32:47, 12.37s/it] + +{'loss': 0.4338, 'learning_rate': 2.4352692509569397e-08, 'epoch': 0.98} + + 98%|█████████▊| 7219/7378 [24:44:38<32:47, 12.37s/it] + 98%|█████████▊| 7220/7378 [24:44:50<32:36, 12.38s/it] + +{'loss': 0.4511, 'learning_rate': 2.404745505852457e-08, 'epoch': 0.98} + + 98%|█████████▊| 7220/7378 [24:44:50<32:36, 12.38s/it] + 98%|█████████▊| 7221/7378 [24:45:03<32:56, 12.59s/it] + +{'loss': 0.5134, 'learning_rate': 2.374414031599437e-08, 'epoch': 0.98} + + 98%|█████████▊| 7221/7378 [24:45:03<32:56, 12.59s/it] + 98%|█████████▊| 7222/7378 [24:45:15<32:02, 12.32s/it] + +{'loss': 0.4365, 'learning_rate': 2.344274834043425e-08, 'epoch': 0.98} + + 98%|█████████▊| 7222/7378 [24:45:15<32:02, 12.32s/it] + 98%|█████████▊| 7223/7378 [24:45:27<31:38, 12.25s/it] + +{'loss': 0.5073, 'learning_rate': 2.314327918993664e-08, 'epoch': 0.98} + + 98%|█████████▊| 7223/7378 [24:45:27<31:38, 12.25s/it] + 98%|█████████▊| 7224/7378 [24:45:39<31:33, 12.29s/it] + +{'loss': 0.4733, 'learning_rate': 2.284573292221759e-08, 'epoch': 0.98} + + 98%|█████████▊| 7224/7378 [24:45:39<31:33, 12.29s/it] + 98%|█████████▊| 7225/7378 [24:45:55<34:03, 13.36s/it] + +{'loss': 0.4416, 'learning_rate': 2.2550109594623447e-08, 'epoch': 0.98} + + 98%|█████████▊| 7225/7378 [24:45:55<34:03, 13.36s/it] + 98%|█████████▊| 7226/7378 [24:46:08<33:15, 13.13s/it] + +{'loss': 0.4557, 'learning_rate': 2.2256409264133082e-08, 'epoch': 0.98} + + 98%|█████████▊| 7226/7378 [24:46:08<33:15, 13.13s/it] + 98%|█████████▊| 7227/7378 [24:46:20<32:22, 12.86s/it] + +{'loss': 0.3944, 'learning_rate': 2.1964631987351214e-08, 'epoch': 0.98} + + 98%|█████████▊| 7227/7378 [24:46:20<32:22, 12.86s/it] + 98%|█████████▊| 7228/7378 [24:46:32<31:46, 12.71s/it] + +{'loss': 0.4467, 'learning_rate': 2.167477782051286e-08, 'epoch': 0.98} + + 98%|█████████▊| 7228/7378 [24:46:32<31:46, 12.71s/it] + 98%|█████████▊| 7229/7378 [24:46:45<31:22, 12.64s/it] + +{'loss': 0.3824, 'learning_rate': 2.1386846819485552e-08, 'epoch': 0.98} + + 98%|█████████▊| 7229/7378 [24:46:45<31:22, 12.64s/it] + 98%|█████████▊| 7230/7378 [24:46:57<30:39, 12.43s/it] + +{'loss': 0.4855, 'learning_rate': 2.1100839039761566e-08, 'epoch': 0.98} + + 98%|█████████▊| 7230/7378 [24:46:57<30:39, 12.43s/it] + 98%|█████████▊| 7231/7378 [24:47:09<30:30, 12.45s/it] + +{'loss': 0.4185, 'learning_rate': 2.0816754536463478e-08, 'epoch': 0.98} + + 98%|█████████▊| 7231/7378 [24:47:09<30:30, 12.45s/it] + 98%|█████████▊| 7232/7378 [24:47:21<29:54, 12.29s/it] + +{'loss': 0.4072, 'learning_rate': 2.0534593364345267e-08, 'epoch': 0.98} + + 98%|█████████▊| 7232/7378 [24:47:21<29:54, 12.29s/it] + 98%|█████████▊| 7233/7378 [24:47:34<29:54, 12.38s/it] + +{'loss': 0.4571, 'learning_rate': 2.0254355577790096e-08, 'epoch': 0.98} + + 98%|█████████▊| 7233/7378 [24:47:34<29:54, 12.38s/it] + 98%|█████████▊| 7234/7378 [24:47:46<29:33, 12.31s/it] + +{'loss': 0.3695, 'learning_rate': 1.9976041230808097e-08, 'epoch': 0.98} + + 98%|█████████▊| 7234/7378 [24:47:46<29:33, 12.31s/it] + 98%|█████████▊| 7235/7378 [24:47:58<29:30, 12.38s/it] + +{'loss': 0.41, 'learning_rate': 1.9699650377039692e-08, 'epoch': 0.98} + + 98%|█████████▊| 7235/7378 [24:47:58<29:30, 12.38s/it] + 98%|█████████▊| 7236/7378 [24:48:11<29:17, 12.38s/it] + +{'loss': 0.474, 'learning_rate': 1.9425183069756716e-08, 'epoch': 0.98} + + 98%|█████████▊| 7236/7378 [24:48:11<29:17, 12.38s/it] + 98%|█████████▊| 7237/7378 [24:48:23<29:01, 12.35s/it] + +{'loss': 0.4119, 'learning_rate': 1.915263936185574e-08, 'epoch': 0.98} + + 98%|█████████▊| 7237/7378 [24:48:23<29:01, 12.35s/it] + 98%|█████████▊| 7238/7378 [24:48:35<28:35, 12.25s/it] + +{'loss': 0.4379, 'learning_rate': 1.888201930586697e-08, 'epoch': 0.98} + + 98%|█████████▊| 7238/7378 [24:48:35<28:35, 12.25s/it] + 98%|█████████▊| 7239/7378 [24:48:47<28:26, 12.28s/it] + +{'loss': 0.4625, 'learning_rate': 1.8613322953948688e-08, 'epoch': 0.98} + + 98%|█████████▊| 7239/7378 [24:48:47<28:26, 12.28s/it] + 98%|█████████▊| 7240/7378 [24:49:00<28:05, 12.22s/it] + +{'loss': 0.4656, 'learning_rate': 1.834655035788613e-08, 'epoch': 0.98} + + 98%|█████████▊| 7240/7378 [24:49:00<28:05, 12.22s/it] + 98%|█████████▊| 7241/7378 [24:49:12<27:46, 12.17s/it] + +{'loss': 0.3875, 'learning_rate': 1.8081701569097055e-08, 'epoch': 0.98} + + 98%|█████████▊| 7241/7378 [24:49:12<27:46, 12.17s/it] + 98%|█████████▊| 7242/7378 [24:49:24<27:41, 12.21s/it] + +{'loss': 0.4356, 'learning_rate': 1.781877663862619e-08, 'epoch': 0.98} + + 98%|█████████▊| 7242/7378 [24:49:24<27:41, 12.21s/it] + 98%|█████████▊| 7243/7378 [24:49:36<27:35, 12.27s/it] + +{'loss': 0.509, 'learning_rate': 1.7557775617149663e-08, 'epoch': 0.98} + + 98%|█████████▊| 7243/7378 [24:49:36<27:35, 12.27s/it] + 98%|█████████▊| 7244/7378 [24:49:49<27:24, 12.27s/it] + +{'loss': 0.3924, 'learning_rate': 1.7298698554968352e-08, 'epoch': 0.98} + + 98%|█████████▊| 7244/7378 [24:49:49<27:24, 12.27s/it] + 98%|█████████▊| 7245/7378 [24:50:01<27:12, 12.28s/it] + +{'loss': 0.331, 'learning_rate': 1.7041545502018976e-08, 'epoch': 0.98} + + 98%|█████████▊| 7245/7378 [24:50:01<27:12, 12.28s/it] + 98%|█████████▊| 7246/7378 [24:50:13<27:04, 12.31s/it] + +{'loss': 0.4032, 'learning_rate': 1.6786316507859667e-08, 'epoch': 0.98} + + 98%|█████████▊| 7246/7378 [24:50:13<27:04, 12.31s/it] + 98%|█████████▊| 7247/7378 [24:50:26<26:58, 12.36s/it] + +{'loss': 0.4086, 'learning_rate': 1.6533011621685523e-08, 'epoch': 0.98} + + 98%|█████████▊| 7247/7378 [24:50:26<26:58, 12.36s/it] + 98%|█████████▊| 7248/7378 [24:50:38<26:42, 12.33s/it] + +{'loss': 0.4355, 'learning_rate': 1.628163089231527e-08, 'epoch': 0.98} + + 98%|█████████▊| 7248/7378 [24:50:38<26:42, 12.33s/it] + 98%|█████████▊| 7249/7378 [24:50:50<26:35, 12.37s/it] + +{'loss': 0.4317, 'learning_rate': 1.6032174368197927e-08, 'epoch': 0.98} + + 98%|█████████▊| 7249/7378 [24:50:50<26:35, 12.37s/it] + 98%|█████████▊| 7250/7378 [24:51:02<26:02, 12.20s/it] + +{'loss': 0.3794, 'learning_rate': 1.5784642097413928e-08, 'epoch': 0.98} + + 98%|█████████▊| 7250/7378 [24:51:02<26:02, 12.20s/it] + 98%|█████████▊| 7251/7378 [24:51:17<27:14, 12.87s/it] + +{'loss': 0.4761, 'learning_rate': 1.553903412767066e-08, 'epoch': 0.98} + + 98%|█████████▊| 7251/7378 [24:51:17<27:14, 12.87s/it] + 98%|█████████▊| 7252/7378 [24:51:29<26:26, 12.59s/it] + +{'loss': 0.4577, 'learning_rate': 1.5295350506305816e-08, 'epoch': 0.98} + + 98%|█████████▊| 7252/7378 [24:51:29<26:26, 12.59s/it] + 98%|█████████▊| 7253/7378 [24:51:41<26:01, 12.49s/it] + +{'loss': 0.4047, 'learning_rate': 1.505359128028405e-08, 'epoch': 0.98} + + 98%|█████████▊| 7253/7378 [24:51:41<26:01, 12.49s/it] + 98%|█████████▊| 7254/7378 [24:51:53<25:49, 12.49s/it] + +{'loss': 0.4132, 'learning_rate': 1.4813756496201426e-08, 'epoch': 0.98} + + 98%|█████████▊| 7254/7378 [24:51:53<25:49, 12.49s/it] + 98%|█████████▊| 7255/7378 [24:52:06<25:29, 12.44s/it] + +{'loss': 0.4221, 'learning_rate': 1.4575846200282073e-08, 'epoch': 0.98} + + 98%|█████████▊| 7255/7378 [24:52:06<25:29, 12.44s/it] + 98%|█████████▊| 7256/7378 [24:52:18<24:58, 12.28s/it] + +{'loss': 0.4108, 'learning_rate': 1.4339860438381536e-08, 'epoch': 0.98} + + 98%|█████████▊| 7256/7378 [24:52:18<24:58, 12.28s/it] + 98%|███���█████▊| 7257/7378 [24:52:30<24:39, 12.23s/it] + +{'loss': 0.4583, 'learning_rate': 1.4105799255978991e-08, 'epoch': 0.98} + + 98%|█████████▊| 7257/7378 [24:52:30<24:39, 12.23s/it] + 98%|█████████▊| 7258/7378 [24:52:42<24:18, 12.16s/it] + +{'loss': 0.4092, 'learning_rate': 1.3873662698188351e-08, 'epoch': 0.98} + + 98%|█████████▊| 7258/7378 [24:52:42<24:18, 12.16s/it] + 98%|█████████▊| 7259/7378 [24:52:54<24:10, 12.19s/it] + +{'loss': 0.5094, 'learning_rate': 1.364345080975049e-08, 'epoch': 0.98} + + 98%|█████████▊| 7259/7378 [24:52:54<24:10, 12.19s/it] + 98%|█████████▊| 7260/7378 [24:53:07<24:11, 12.30s/it] + +{'loss': 0.4179, 'learning_rate': 1.3415163635033257e-08, 'epoch': 0.98} + + 98%|█████████▊| 7260/7378 [24:53:07<24:11, 12.30s/it] + 98%|█████████▊| 7261/7378 [24:53:19<23:57, 12.29s/it] + +{'loss': 0.4802, 'learning_rate': 1.3188801218037007e-08, 'epoch': 0.98} + + 98%|█████████▊| 7261/7378 [24:53:19<23:57, 12.29s/it] + 98%|█████████▊| 7262/7378 [24:53:31<23:41, 12.26s/it] + +{'loss': 0.3562, 'learning_rate': 1.2964363602387953e-08, 'epoch': 0.98} + + 98%|█████████▊| 7262/7378 [24:53:31<23:41, 12.26s/it] + 98%|█████████▊| 7263/7378 [24:53:43<23:11, 12.10s/it] + +{'loss': 0.3861, 'learning_rate': 1.2741850831345936e-08, 'epoch': 0.98} + + 98%|█████████▊| 7263/7378 [24:53:43<23:11, 12.10s/it] + 98%|█████████▊| 7264/7378 [24:53:55<22:51, 12.03s/it] + +{'loss': 0.3773, 'learning_rate': 1.2521262947793322e-08, 'epoch': 0.98} + + 98%|█████████▊| 7264/7378 [24:53:55<22:51, 12.03s/it] + 98%|█████████▊| 7265/7378 [24:54:07<22:54, 12.17s/it] + +{'loss': 0.381, 'learning_rate': 1.2302599994247211e-08, 'epoch': 0.98} + + 98%|█████████▊| 7265/7378 [24:54:07<22:54, 12.17s/it] + 98%|█████████▊| 7266/7378 [24:54:19<22:44, 12.19s/it] + +{'loss': 0.4571, 'learning_rate': 1.2085862012850557e-08, 'epoch': 0.98} + + 98%|█████████▊| 7266/7378 [24:54:19<22:44, 12.19s/it] + 98%|█████████▊| 7267/7378 [24:54:31<22:31, 12.18s/it] + +{'loss': 0.4463, 'learning_rate': 1.1871049045376615e-08, 'epoch': 0.98} + + 98%|█████████▊| 7267/7378 [24:54:31<22:31, 12.18s/it] + 99%|█████████▊| 7268/7378 [24:54:44<22:24, 12.22s/it] + +{'loss': 0.4412, 'learning_rate': 1.1658161133227818e-08, 'epoch': 0.99} + + 99%|█████████▊| 7268/7378 [24:54:44<22:24, 12.22s/it] + 99%|█████████▊| 7269/7378 [24:54:56<22:16, 12.26s/it] + +{'loss': 0.4737, 'learning_rate': 1.1447198317433573e-08, 'epoch': 0.99} + + 99%|█████████▊| 7269/7378 [24:54:56<22:16, 12.26s/it] + 99%|█████████▊| 7270/7378 [24:55:08<22:06, 12.29s/it] + +{'loss': 0.3922, 'learning_rate': 1.1238160638653572e-08, 'epoch': 0.99} + + 99%|█████████▊| 7270/7378 [24:55:08<22:06, 12.29s/it] + 99%|█████████▊| 7271/7378 [24:55:21<21:57, 12.31s/it] + +{'loss': 0.42, 'learning_rate': 1.1031048137177813e-08, 'epoch': 0.99} + + 99%|█████████▊| 7271/7378 [24:55:21<21:57, 12.31s/it] + 99%|█████████▊| 7272/7378 [24:55:33<21:39, 12.26s/it] + +{'loss': 0.4681, 'learning_rate': 1.0825860852923253e-08, 'epoch': 0.99} + + 99%|█████████▊| 7272/7378 [24:55:33<21:39, 12.26s/it] + 99%|█████████▊| 7273/7378 [24:55:45<21:19, 12.18s/it] + +{'loss': 0.4412, 'learning_rate': 1.0622598825437147e-08, 'epoch': 0.99} + + 99%|█████████▊| 7273/7378 [24:55:45<21:19, 12.18s/it] + 99%|█████████▊| 7274/7378 [24:55:58<21:25, 12.36s/it] + +{'loss': 0.4792, 'learning_rate': 1.0421262093894823e-08, 'epoch': 0.99} + + 99%|█████████▊| 7274/7378 [24:55:58<21:25, 12.36s/it] + 99%|█████████▊| 7275/7378 [24:56:10<21:18, 12.41s/it] + +{'loss': 0.4498, 'learning_rate': 1.0221850697100799e-08, 'epoch': 0.99} + + 99%|█████████▊| 7275/7378 [24:56:10<21:18, 12.41s/it] + 99%|█████████▊| 7276/7378 [24:56:23<21:05, 12.41s/it] + +{'loss': 0.4124, 'learning_rate': 1.0024364673487663e-08, 'epoch': 0.99} + + 99%|█████████▊| 7276/7378 [24:56:23<21:05, 12.41s/it] + 99%|█████████▊| 7277/7378 [24:56:35<20:48, 12.36s/it] + +{'loss': 0.4153, 'learning_rate': 9.828804061118303e-09, 'epoch': 0.99} + + 99%|█████████▊| 7277/7378 [24:56:35<20:48, 12.36s/it] + 99%|█████████▊| 7278/7378 [24:56:47<20:27, 12.28s/it] + +{'loss': 0.4095, 'learning_rate': 9.635168897684788e-09, 'epoch': 0.99} + + 99%|█████████▊| 7278/7378 [24:56:47<20:27, 12.28s/it] + 99%|█████████▊| 7279/7378 [24:56:59<20:12, 12.24s/it] + +{'loss': 0.4305, 'learning_rate': 9.443459220505046e-09, 'epoch': 0.99} + + 99%|█████████▊| 7279/7378 [24:56:59<20:12, 12.24s/it] + 99%|█████████▊| 7280/7378 [24:57:12<20:04, 12.29s/it] + +{'loss': 0.4456, 'learning_rate': 9.253675066530632e-09, 'epoch': 0.99} + + 99%|█████████▊| 7280/7378 [24:57:12<20:04, 12.29s/it] + 99%|█████████▊| 7281/7378 [24:57:24<19:47, 12.25s/it] + +{'loss': 0.4496, 'learning_rate': 9.06581647233895e-09, 'epoch': 0.99} + + 99%|█████████▊| 7281/7378 [24:57:24<19:47, 12.25s/it] + 99%|█████████▊| 7282/7378 [24:57:36<19:25, 12.14s/it] + +{'loss': 0.4451, 'learning_rate': 8.879883474135487e-09, 'epoch': 0.99} + + 99%|█████████▊| 7282/7378 [24:57:36<19:25, 12.14s/it] + 99%|█████████▊| 7283/7378 [24:57:48<19:23, 12.25s/it] + +{'loss': 0.4241, 'learning_rate': 8.695876107757129e-09, 'epoch': 0.99} + + 99%|█████████▊| 7283/7378 [24:57:48<19:23, 12.25s/it] + 99%|█████████▊| 7284/7378 [24:58:00<19:14, 12.28s/it] + +{'loss': 0.4099, 'learning_rate': 8.513794408667731e-09, 'epoch': 0.99} + + 99%|█████████▊| 7284/7378 [24:58:00<19:14, 12.28s/it] + 99%|█████████▊| 7285/7378 [24:58:12<18:53, 12.18s/it] + +{'loss': 0.4349, 'learning_rate': 8.333638411960333e-09, 'epoch': 0.99} + + 99%|█████████▊| 7285/7378 [24:58:12<18:53, 12.18s/it] + 99%|█████████▉| 7286/7378 [24:58:25<18:39, 12.17s/it] + +{'loss': 0.4281, 'learning_rate': 8.155408152358268e-09, 'epoch': 0.99} + + 99%|█████████▉| 7286/7378 [24:58:25<18:39, 12.17s/it] + 99%|█████████▉| 7287/7378 [24:58:37<18:41, 12.33s/it] + +{'loss': 0.4227, 'learning_rate': 7.97910366421295e-09, 'epoch': 0.99} + + 99%|█████████▉| 7287/7378 [24:58:37<18:41, 12.33s/it] + 99%|█████████▉| 7288/7378 [24:58:50<18:35, 12.40s/it] + +{'loss': 0.4488, 'learning_rate': 7.804724981501644e-09, 'epoch': 0.99} + + 99%|█████████▉| 7288/7378 [24:58:50<18:35, 12.40s/it] + 99%|█████████▉| 7289/7378 [24:59:02<18:15, 12.31s/it] + +{'loss': 0.4, 'learning_rate': 7.632272137836349e-09, 'epoch': 0.99} + + 99%|█████████▉| 7289/7378 [24:59:02<18:15, 12.31s/it] + 99%|█████████▉| 7290/7378 [24:59:14<17:58, 12.25s/it] + +{'loss': 0.4446, 'learning_rate': 7.461745166453815e-09, 'epoch': 0.99} + + 99%|█████████▉| 7290/7378 [24:59:14<17:58, 12.25s/it] + 99%|█████████▉| 7291/7378 [24:59:26<17:43, 12.23s/it] + +{'loss': 0.4181, 'learning_rate': 7.293144100218863e-09, 'epoch': 0.99} + + 99%|█████████▉| 7291/7378 [24:59:26<17:43, 12.23s/it] + 99%|█████████▉| 7292/7378 [24:59:39<17:41, 12.34s/it] + +{'loss': 0.471, 'learning_rate': 7.12646897162772e-09, 'epoch': 0.99} + + 99%|█████████▉| 7292/7378 [24:59:39<17:41, 12.34s/it] + 99%|█████████▉| 7293/7378 [24:59:51<17:21, 12.26s/it] + +{'loss': 0.4139, 'learning_rate': 6.96171981280469e-09, 'epoch': 0.99} + + 99%|█████████▉| 7293/7378 [24:59:51<17:21, 12.26s/it] + 99%|█████████▉| 7294/7378 [25:00:03<17:12, 12.29s/it] + +{'loss': 0.3597, 'learning_rate': 6.798896655502152e-09, 'epoch': 0.99} + + 99%|█████████▉| 7294/7378 [25:00:03<17:12, 12.29s/it] + 99%|█████████▉| 7295/7378 [25:00:15<16:53, 12.21s/it] + +{'loss': 0.3909, 'learning_rate': 6.637999531101669e-09, 'epoch': 0.99} + + 99%|█████████▉| 7295/7378 [25:00:15<16:53, 12.21s/it] + 99%|█████████▉| 7296/7378 [25:00:28<16:47, 12.29s/it] + +{'loss': 0.414, 'learning_rate': 6.479028470615101e-09, 'epoch': 0.99} + + 99%|█████████▉| 7296/7378 [25:00:28<16:47, 12.29s/it] + 99%|█████████▉| 7297/7378 [25:00:40<16:33, 12.27s/it] + +{'loss': 0.4999, 'learning_rate': 6.321983504679052e-09, 'epoch': 0.99} + + 99%|█████████▉| 7297/7378 [25:00:40<16:33, 12.27s/it] + 99%|█████████▉| 7298/7378 [25:00:52<16:15, 12.20s/it] + +{'loss': 0.4171, 'learning_rate': 6.166864663562644e-09, 'epoch': 0.99} + + 99%|█████████▉| 7298/7378 [25:00:52<16:15, 12.20s/it] + 99%|█████████▉| 7299/7378 [25:01:04<15:59, 12.14s/it] + +{'loss': 0.4246, 'learning_rate': 6.013671977164182e-09, 'epoch': 0.99} + + 99%|█████████▉| 7299/7378 [25:01:04<15:59, 12.14s/it] + 99%|█████████▉| 7300/7378 [25:01:16<15:53, 12.22s/it] + +{'loss': 0.4394, 'learning_rate': 5.862405475006716e-09, 'epoch': 0.99} + + 99%|█████████▉| 7300/7378 [25:01:16<15:53, 12.22s/it] + 99%|█████████▉| 7301/7378 [25:01:29<15:42, 12.24s/it] + +{'loss': 0.3781, 'learning_rate': 5.713065186245814e-09, 'epoch': 0.99} + + 99%|█████████▉| 7301/7378 [25:01:29<15:42, 12.24s/it] + 99%|█████████▉| 7302/7378 [25:01:41<15:24, 12.17s/it] + +{'loss': 0.4307, 'learning_rate': 5.565651139664008e-09, 'epoch': 0.99} + + 99%|█████████▉| 7302/7378 [25:01:41<15:24, 12.17s/it] + 99%|█████████▉| 7303/7378 [25:01:53<15:07, 12.10s/it] + +{'loss': 0.417, 'learning_rate': 5.420163363673014e-09, 'epoch': 0.99} + + 99%|█████████▉| 7303/7378 [25:01:53<15:07, 12.10s/it] + 99%|█████████▉| 7304/7378 [25:02:05<15:01, 12.18s/it] + +{'loss': 0.489, 'learning_rate': 5.276601886313737e-09, 'epoch': 0.99} + + 99%|█████████▉| 7304/7378 [25:02:05<15:01, 12.18s/it] + 99%|█████████▉| 7305/7378 [25:02:17<14:53, 12.24s/it] + +{'loss': 0.427, 'learning_rate': 5.1349667352551535e-09, 'epoch': 0.99} + + 99%|█████████▉| 7305/7378 [25:02:17<14:53, 12.24s/it] + 99%|█████████▉| 7306/7378 [25:02:30<14:41, 12.24s/it] + +{'loss': 0.4005, 'learning_rate': 4.995257937795428e-09, 'epoch': 0.99} + + 99%|█████████▉| 7306/7378 [25:02:30<14:41, 12.24s/it] + 99%|█████████▉| 7307/7378 [25:02:42<14:28, 12.23s/it] + +{'loss': 0.4152, 'learning_rate': 4.8574755208608e-09, 'epoch': 0.99} + + 99%|█████████▉| 7307/7378 [25:02:42<14:28, 12.23s/it] + 99%|█████████▉| 7308/7378 [25:02:54<14:14, 12.21s/it] + +{'loss': 0.4329, 'learning_rate': 4.721619511006692e-09, 'epoch': 0.99} + + 99%|█████████▉| 7308/7378 [25:02:54<14:14, 12.21s/it] + 99%|█████████▉| 7309/7378 [25:03:06<13:58, 12.16s/it] + +{'loss': 0.3755, 'learning_rate': 4.587689934418826e-09, 'epoch': 0.99} + + 99%|█████████▉| 7309/7378 [25:03:06<13:58, 12.16s/it] + 99%|█████████▉| 7310/7378 [25:03:18<13:44, 12.13s/it] + +{'loss': 0.3933, 'learning_rate': 4.4556868169076674e-09, 'epoch': 0.99} + + 99%|█████████▉| 7310/7378 [25:03:18<13:44, 12.13s/it] + 99%|█████████▉| 7311/7378 [25:03:30<13:28, 12.07s/it] + +{'loss': 0.3645, 'learning_rate': 4.325610183915086e-09, 'epoch': 0.99} + + 99%|█████████▉| 7311/7378 [25:03:30<13:28, 12.07s/it] + 99%|█████████▉| 7312/7378 [25:03:42<13:14, 12.04s/it] + +{'loss': 0.4355, 'learning_rate': 4.197460060513248e-09, 'epoch': 0.99} + + 99%|█████████▉| 7312/7378 [25:03:42<13:14, 12.04s/it] + 99%|█████████▉| 7313/7378 [25:03:58<14:12, 13.12s/it] + +{'loss': 0.3394, 'learning_rate': 4.071236471399065e-09, 'epoch': 0.99} + + 99%|█████████▉| 7313/7378 [25:03:58<14:12, 13.12s/it] + 99%|█████████▉| 7314/7378 [25:04:13<14:44, 13.81s/it] + +{'loss': 0.4378, 'learning_rate': 3.946939440900855e-09, 'epoch': 0.99} + + 99%|█████████▉| 7314/7378 [25:04:13<14:44, 13.81s/it] + 99%|█████████▉| 7315/7378 [25:04:25<13:56, 13.27s/it] + +{'loss': 0.4949, 'learning_rate': 3.8245689929750084e-09, 'epoch': 0.99} + + 99%|█████████▉| 7315/7378 [25:04:25<13:56, 13.27s/it] + 99%|█████████▉| 7316/7378 [25:04:37<13:26, 13.01s/it] + +{'loss': 0.4527, 'learning_rate': 3.7041251512071053e-09, 'epoch': 0.99} + + 99%|█████████▉| 7316/7378 [25:04:37<13:26, 13.01s/it] + 99%|█████████▉| 7317/7378 [25:04:53<14:07, 13.89s/it] + +{'loss': 0.4275, 'learning_rate': 3.5856079388096874e-09, 'epoch': 0.99} + + 99%|█████████▉| 7317/7378 [25:04:53<14:07, 13.89s/it] + 99%|█████████▉| 7318/7378 [25:05:06<13:24, 13.41s/it] + +{'loss': 0.4532, 'learning_rate': 3.4690173786255943e-09, 'epoch': 0.99} + + 99%|█████████▉| 7318/7378 [25:05:06<13:24, 13.41s/it] + 99%|█████████▉| 7319/7378 [25:05:21<13:49, 14.06s/it] + +{'loss': 0.4064, 'learning_rate': 3.3543534931257395e-09, 'epoch': 0.99} + + 99%|█████████▉| 7319/7378 [25:05:21<13:49, 14.06s/it] + 99%|█████████▉| 7320/7378 [25:05:36<13:52, 14.35s/it] + +{'loss': 0.469, 'learning_rate': 3.241616304410222e-09, 'epoch': 0.99} + + 99%|█████████▉| 7320/7378 [25:05:36<13:52, 14.35s/it] + 99%|█████████▉| 7321/7378 [25:05:58<15:37, 16.44s/it] + +{'loss': 0.4447, 'learning_rate': 3.1308058342072177e-09, 'epoch': 0.99} + + 99%|█████████▉| 7321/7378 [25:05:58<15:37, 16.44s/it] + 99%|█████████▉| 7322/7378 [25:06:10<14:06, 15.11s/it] + +{'loss': 0.4575, 'learning_rate': 3.0219221038729764e-09, 'epoch': 0.99} + + 99%|█████████▉| 7322/7378 [25:06:10<14:06, 15.11s/it] + 99%|█████████▉| 7323/7378 [25:06:22<13:03, 14.25s/it] + +{'loss': 0.4422, 'learning_rate': 2.9149651343940433e-09, 'epoch': 0.99} + + 99%|█████████▉| 7323/7378 [25:06:22<13:03, 14.25s/it] + 99%|█████████▉| 7324/7378 [25:06:34<12:13, 13.58s/it] + +{'loss': 0.4042, 'learning_rate': 2.80993494638393e-09, 'epoch': 0.99} + + 99%|█████████▉| 7324/7378 [25:06:34<12:13, 13.58s/it] + 99%|█████████▉| 7325/7378 [25:06:47<11:45, 13.30s/it] + +{'loss': 0.4102, 'learning_rate': 2.706831560085332e-09, 'epoch': 0.99} + + 99%|█████████▉| 7325/7378 [25:06:47<11:45, 13.30s/it] + 99%|█████████▉| 7326/7378 [25:06:59<11:17, 13.04s/it] + +{'loss': 0.4196, 'learning_rate': 2.605654995371243e-09, 'epoch': 0.99} + + 99%|█████████▉| 7326/7378 [25:06:59<11:17, 13.04s/it] + 99%|█████████▉| 7327/7378 [25:07:11<10:54, 12.83s/it] + +{'loss': 0.4115, 'learning_rate': 2.506405271741619e-09, 'epoch': 0.99} + + 99%|█████████▉| 7327/7378 [25:07:11<10:54, 12.83s/it] + 99%|█████████▉| 7328/7378 [25:07:24<10:33, 12.68s/it] + +{'loss': 0.3879, 'learning_rate': 2.409082408323382e-09, 'epoch': 0.99} + + 99%|█████████▉| 7328/7378 [25:07:24<10:33, 12.68s/it] + 99%|█████████▉| 7329/7378 [25:07:36<10:17, 12.60s/it] + +{'loss': 0.3762, 'learning_rate': 2.3136864238759714e-09, 'epoch': 0.99} + + 99%|█████████▉| 7329/7378 [25:07:36<10:17, 12.60s/it] + 99%|█████████▉| 7330/7378 [25:07:48<09:59, 12.49s/it] + +{'loss': 0.4537, 'learning_rate': 2.22021733678468e-09, 'epoch': 0.99} + + 99%|█████████▉| 7330/7378 [25:07:48<09:59, 12.49s/it] + 99%|█████████▉| 7331/7378 [25:08:01<09:43, 12.43s/it] + +{'loss': 0.4208, 'learning_rate': 2.128675165065097e-09, 'epoch': 0.99} + + 99%|█████████▉| 7331/7378 [25:08:01<09:43, 12.43s/it] + 99%|█████████▉| 7332/7378 [25:08:13<09:29, 12.38s/it] + +{'loss': 0.4083, 'learning_rate': 2.0390599263586662e-09, 'epoch': 0.99} + + 99%|█████████▉| 7332/7378 [25:08:13<09:29, 12.38s/it] + 99%|█████████▉| 7333/7378 [25:08:25<09:11, 12.27s/it] + +{'loss': 0.4031, 'learning_rate': 1.951371637939348e-09, 'epoch': 0.99} + + 99%|█████████▉| 7333/7378 [25:08:25<09:11, 12.27s/it] + 99%|█████████▉| 7334/7378 [25:08:37<08:57, 12.22s/it] + +{'loss': 0.391, 'learning_rate': 1.8656103167058458e-09, 'epoch': 0.99} + + 99%|█████████▉| 7334/7378 [25:08:37<08:57, 12.22s/it] + 99%|█████████▉| 7335/7378 [25:08:49<08:44, 12.21s/it] + +{'loss': 0.4224, 'learning_rate': 1.781775979189382e-09, 'epoch': 0.99} + + 99%|█████████▉| 7335/7378 [25:08:49<08:44, 12.21s/it] + 99%|█████████▉| 7336/7378 [25:09:01<08:33, 12.23s/it] + +{'loss': 0.3911, 'learning_rate': 1.699868641545921e-09, 'epoch': 0.99} + + 99%|█████████▉| 7336/7378 [25:09:01<08:33, 12.23s/it] + 99%|█████████▉| 7337/7378 [25:09:14<08:30, 12.46s/it] + +{'loss': 0.4668, 'learning_rate': 1.6198883195617244e-09, 'epoch': 0.99} + + 99%|█████████▉| 7337/7378 [25:09:14<08:30, 12.46s/it] + 99%|█████████▉| 7338/7378 [25:09:26<08:13, 12.35s/it] + +{'loss': 0.419, 'learning_rate': 1.541835028653349e-09, 'epoch': 0.99} + + 99%|█████████▉| 7338/7378 [25:09:26<08:13, 12.35s/it] + 99%|█████████▉| 7339/7378 [25:09:39<07:59, 12.28s/it] + +{'loss': 0.4371, 'learning_rate': 1.4657087838632067e-09, 'epoch': 0.99} + + 99%|█████████▉| 7339/7378 [25:09:39<07:59, 12.28s/it] + 99%|█████████▉| 7340/7378 [25:09:51<07:46, 12.27s/it] + +{'loss': 0.4052, 'learning_rate': 1.3915095998640049e-09, 'epoch': 0.99} + + 99%|█████████▉| 7340/7378 [25:09:51<07:46, 12.27s/it] + 99%|█████████▉| 7341/7378 [25:10:03<07:34, 12.27s/it] + +{'loss': 0.458, 'learning_rate': 1.3192374909565265e-09, 'epoch': 0.99} + + 99%|█████████▉| 7341/7378 [25:10:03<07:34, 12.27s/it] +100%|█████████▉| 7342/7378 [25:10:15<07:18, 12.18s/it] + +{'loss': 0.386, 'learning_rate': 1.2488924710696294e-09, 'epoch': 1.0} + +100%|█████████▉| 7342/7378 [25:10:15<07:18, 12.18s/it] +100%|█████████▉| 7343/7378 [25:10:27<07:06, 12.19s/it] + +{'loss': 0.4253, 'learning_rate': 1.1804745537602468e-09, 'epoch': 1.0} + +100%|█████████▉| 7343/7378 [25:10:27<07:06, 12.19s/it] +100%|█████████▉| 7344/7378 [25:10:40<06:57, 12.29s/it] + +{'loss': 0.3764, 'learning_rate': 1.1139837522167186e-09, 'epoch': 1.0} + +100%|█████████▉| 7344/7378 [25:10:40<06:57, 12.29s/it] +100%|█████████▉| 7345/7378 [25:10:52<06:47, 12.36s/it] + +{'loss': 0.4194, 'learning_rate': 1.0494200792532383e-09, 'epoch': 1.0} + +100%|█████████▉| 7345/7378 [25:10:52<06:47, 12.36s/it] +100%|█████████▉| 7346/7378 [25:11:05<06:35, 12.36s/it] + +{'loss': 0.4154, 'learning_rate': 9.867835473142962e-10, 'epoch': 1.0} + +100%|█████████▉| 7346/7378 [25:11:05<06:35, 12.36s/it] +100%|█████████▉| 7347/7378 [25:11:17<06:20, 12.27s/it] + +{'loss': 0.4091, 'learning_rate': 9.260741684702368e-10, 'epoch': 1.0} + +100%|█████████▉| 7347/7378 [25:11:17<06:20, 12.27s/it] +100%|█████████▉| 7348/7378 [25:11:29<06:10, 12.33s/it] + +{'loss': 0.4502, 'learning_rate': 8.672919544228109e-10, 'epoch': 1.0} + +100%|█████████▉| 7348/7378 [25:11:29<06:10, 12.33s/it] +100%|█████████▉| 7349/7378 [25:11:42<05:59, 12.39s/it] + +{'loss': 0.4486, 'learning_rate': 8.104369165018444e-10, 'epoch': 1.0} + +100%|█████████▉| 7349/7378 [25:11:42<05:59, 12.39s/it] +100%|█████████▉| 7350/7378 [25:11:54<05:45, 12.32s/it] + +{'loss': 0.4056, 'learning_rate': 7.555090656652386e-10, 'epoch': 1.0} + +100%|█████████▉| 7350/7378 [25:11:54<05:45, 12.32s/it] +100%|█████████▉| 7351/7378 [25:12:06<05:32, 12.32s/it] + +{'loss': 0.4113, 'learning_rate': 7.025084124989701e-10, 'epoch': 1.0} + +100%|█████████▉| 7351/7378 [25:12:06<05:32, 12.32s/it] +100%|█████████▉| 7352/7378 [25:12:18<05:17, 12.21s/it] + +{'loss': 0.4954, 'learning_rate': 6.514349672182008e-10, 'epoch': 1.0} + +100%|█████████▉| 7352/7378 [25:12:18<05:17, 12.21s/it] +100%|█████████▉| 7353/7378 [25:12:30<05:04, 12.17s/it] + +{'loss': 0.4549, 'learning_rate': 6.022887396672782e-10, 'epoch': 1.0} + +100%|█████████▉| 7353/7378 [25:12:30<05:04, 12.17s/it] +100%|█████████▉| 7354/7378 [25:12:42<04:52, 12.17s/it] + +{'loss': 0.441, 'learning_rate': 5.550697393175153e-10, 'epoch': 1.0} + +100%|█████████▉| 7354/7378 [25:12:42<04:52, 12.17s/it] +100%|█████████▉| 7355/7378 [25:12:55<04:40, 12.19s/it] + +{'loss': 0.4358, 'learning_rate': 5.097779752694099e-10, 'epoch': 1.0} + +100%|█████████▉| 7355/7378 [25:12:55<04:40, 12.19s/it] +100%|█████████▉| 7356/7378 [25:13:07<04:28, 12.19s/it] + +{'loss': 0.4058, 'learning_rate': 4.664134562526456e-10, 'epoch': 1.0} + +100%|█████████▉| 7356/7378 [25:13:07<04:28, 12.19s/it] +100%|█████████▉| 7357/7378 [25:13:19<04:17, 12.24s/it] + +{'loss': 0.412, 'learning_rate': 4.2497619062498164e-10, 'epoch': 1.0} + +100%|█████████▉| 7357/7378 [25:13:19<04:17, 12.24s/it] +100%|█████████▉| 7358/7378 [25:13:32<04:06, 12.33s/it] + +{'loss': 0.3656, 'learning_rate': 3.85466186372252e-10, 'epoch': 1.0} + +100%|█████████▉| 7358/7378 [25:13:32<04:06, 12.33s/it] +100%|█████████▉| 7359/7378 [25:13:44<03:55, 12.41s/it] + +{'loss': 0.4, 'learning_rate': 3.4788345111058666e-10, 'epoch': 1.0} + +100%|█████████▉| 7359/7378 [25:13:44<03:55, 12.41s/it] +100%|█████████▉| 7360/7378 [25:13:57<03:44, 12.49s/it] + +{'loss': 0.4648, 'learning_rate': 3.122279920830806e-10, 'epoch': 1.0} + +100%|█████████▉| 7360/7378 [25:13:57<03:44, 12.49s/it] +100%|█████████▉| 7361/7378 [25:14:09<03:30, 12.36s/it] + +{'loss': 0.4768, 'learning_rate': 2.784998161620145e-10, 'epoch': 1.0} + +100%|█████████▉| 7361/7378 [25:14:09<03:30, 12.36s/it] +100%|█████████▉| 7362/7378 [25:14:21<03:14, 12.13s/it] + +{'loss': 0.4212, 'learning_rate': 2.466989298466338e-10, 'epoch': 1.0} + +100%|█████████▉| 7362/7378 [25:14:21<03:14, 12.13s/it] +100%|█████████▉| 7363/7378 [25:14:33<03:01, 12.10s/it] + +{'loss': 0.404, 'learning_rate': 2.1682533926759008e-10, 'epoch': 1.0} + +100%|█████████▉| 7363/7378 [25:14:33<03:01, 12.10s/it] +100%|█████████▉| 7364/7378 [25:14:45<02:49, 12.10s/it] + +{'loss': 0.4586, 'learning_rate': 1.8887905018138975e-10, 'epoch': 1.0} + +100%|█████████▉| 7364/7378 [25:14:45<02:49, 12.10s/it] +100%|█████████▉| 7365/7378 [25:14:57<02:38, 12.16s/it] + +{'loss': 0.4198, 'learning_rate': 1.628600679748349e-10, 'epoch': 1.0} + +100%|█████████▉| 7365/7378 [25:14:57<02:38, 12.16s/it] +100%|█████████▉| 7366/7378 [25:15:09<02:26, 12.19s/it] + +{'loss': 0.4477, 'learning_rate': 1.3876839766280292e-10, 'epoch': 1.0} + +100%|█████████▉| 7366/7378 [25:15:09<02:26, 12.19s/it] +100%|█████████▉| 7367/7378 [25:15:22<02:14, 12.24s/it] + +{'loss': 0.4552, 'learning_rate': 1.1660404388824653e-10, 'epoch': 1.0} + +100%|█████████▉| 7367/7378 [25:15:22<02:14, 12.24s/it] +100%|█████████▉| 7368/7378 [25:15:34<02:02, 12.23s/it] + +{'loss': 0.4771, 'learning_rate': 9.636701092330392e-11, 'epoch': 1.0} + +100%|█████████▉| 7368/7378 [25:15:34<02:02, 12.23s/it] +100%|█████████▉| 7369/7378 [25:15:46<01:50, 12.30s/it] + +{'loss': 0.4722, 'learning_rate': 7.805730266818856e-11, 'epoch': 1.0} + +100%|█████████▉| 7369/7378 [25:15:46<01:50, 12.30s/it] +100%|█████████▉| 7370/7378 [25:15:59<01:37, 12.24s/it] + +{'loss': 0.3928, 'learning_rate': 6.167492265118924e-11, 'epoch': 1.0} + +100%|█████████▉| 7370/7378 [25:15:59<01:37, 12.24s/it] +100%|█████████▉| 7371/7378 [25:16:11<01:25, 12.22s/it] + +{'loss': 0.4075, 'learning_rate': 4.721987403089046e-11, 'epoch': 1.0} + +100%|█████████▉| 7371/7378 [25:16:11<01:25, 12.22s/it] +100%|█████████▉| 7372/7378 [25:16:23<01:13, 12.24s/it] + +{'loss': 0.4387, 'learning_rate': 3.469215959284178e-11, 'epoch': 1.0} + +100%|█████████▉| 7372/7378 [25:16:23<01:13, 12.24s/it] +100%|█████████▉| 7373/7378 [25:16:35<01:01, 12.27s/it] + +{'loss': 0.4139, 'learning_rate': 2.4091781751778287e-11, 'epoch': 1.0} + +100%|█████████▉| 7373/7378 [25:16:35<01:01, 12.27s/it] +100%|█████████▉| 7374/7378 [25:16:48<00:49, 12.27s/it] + +{'loss': 0.4374, 'learning_rate': 1.5418742549400122e-11, 'epoch': 1.0} + +100%|█████████▉| 7374/7378 [25:16:48<00:49, 12.27s/it] +100%|█████████▉| 7375/7378 [25:17:00<00:36, 12.29s/it] + +{'loss': 0.4468, 'learning_rate': 8.673043658813385e-12, 'epoch': 1.0} + +100%|█████████▉| 7375/7378 [25:17:00<00:36, 12.29s/it] +100%|█████████▉| 7376/7378 [25:17:12<00:24, 12.25s/it] + +{'loss': 0.4392, 'learning_rate': 3.854686380089234e-12, 'epoch': 1.0} + +100%|█████████▉| 7376/7378 [25:17:12<00:24, 12.25s/it] +100%|█████████▉| 7377/7378 [25:17:24<00:12, 12.19s/it] + +{'loss': 0.4561, 'learning_rate': 9.636716413741198e-13, 'epoch': 1.0} + +100%|█████████▉| 7377/7378 [25:17:24<00:12, 12.19s/it] +100%|██████████| 7378/7378 [25:17:40<00:00, 13.22s/it] + +{'loss': 0.4294, 'learning_rate': 0.0, 'epoch': 1.0} + +100%|██████████| 7378/7378 [25:17:40<00:00, 13.22s/it][INFO|trainer.py:1962] 2025-01-24 01:40:37,430 >> + +Training completed. Do not forget to share your model on huggingface.co/models =) + + + + +{'train_runtime': 91067.1197, 'train_samples_per_second': 10.37, 'train_steps_per_second': 0.081, 'train_loss': 0.47721504658969577, 'epoch': 1.0} + +100%|██████████| 7378/7378 [25:17:40<00:00, 13.22s/it] +100%|██████████| 7378/7378 [25:17:40<00:00, 12.34s/it] +Rank 0: Only save projectors: False +[INFO|trainer.py:2936] 2025-01-24 01:41:04,782 >> Saving model checkpoint to ./checkpoints/llavaAR4-qwen2_5-32b-sft-llavanext-notext-kn-infpolishmd-detail-knins40k-creationme10kfixed-chart11kmerge-tqa8k-info28kgpt +[INFO|configuration_utils.py:473] 2025-01-24 01:41:04,785 >> Configuration saved in ./checkpoints/llavaAR4-qwen2_5-32b-sft-llavanext-notext-kn-infpolishmd-detail-knins40k-creationme10kfixed-chart11kmerge-tqa8k-info28kgpt/config.json +[INFO|configuration_utils.py:594] 2025-01-24 01:41:04,786 >> Configuration saved in ./checkpoints/llavaAR4-qwen2_5-32b-sft-llavanext-notext-kn-infpolishmd-detail-knins40k-creationme10kfixed-chart11kmerge-tqa8k-info28kgpt/generation_config.json +dlc1irjyfb0zt5ew-master-0:80:3222 [6] NCCL INFO [Service thread] Connection closed by localRank 3 +dlc1irjyfb0zt5ew-master-0:78:3221 [4] NCCL INFO [Service thread] Connection closed by localRank 3 +dlc1irjyfb0zt5ew-master-0:76:3223 [2] NCCL INFO [Service thread] Connection closed by localRank 3 +dlc1irjyfb0zt5ew-master-0:74:3225 [0] NCCL INFO [Service thread] Connection closed by localRank 3 +dlc1irjyfb0zt5ew-master-0:80:358 [6] NCCL INFO [Service thread] Connection closed by localRank 3 +dlc1irjyfb0zt5ew-master-0:78:363 [4] NCCL INFO [Service thread] Connection closed by localRank 3 +dlc1irjyfb0zt5ew-master-0:76:364 [2] NCCL INFO [Service thread] Connection closed by localRank 3 +dlc1irjyfb0zt5ew-master-0:74:365 [0] NCCL INFO [Service thread] Connection closed by localRank 3 +dlc1irjyfb0zt5ew-master-0:80:3222 [6] NCCL INFO [Service thread] Connection closed by localRank 7 +dlc1irjyfb0zt5ew-master-0:78:3221 [4] NCCL INFO [Service thread] Connection closed by localRank 7 +dlc1irjyfb0zt5ew-master-0:76:3223 [2] NCCL INFO [Service thread] Connection closed by localRank 7 +dlc1irjyfb0zt5ew-master-0:74:3225 [0] NCCL INFO [Service thread] Connection closed by localRank 7 +dlc1irjyfb0zt5ew-master-0:80:358 [6] NCCL INFO [Service thread] Connection closed by localRank 7 +dlc1irjyfb0zt5ew-master-0:78:363 [4] NCCL INFO [Service thread] Connection closed by localRank 7 +dlc1irjyfb0zt5ew-master-0:76:364 [2] NCCL INFO [Service thread] Connection closed by localRank 7 +dlc1irjyfb0zt5ew-master-0:74:365 [0] NCCL INFO [Service thread] Connection closed by localRank 7 +dlc1irjyfb0zt5ew-master-0:78:3221 [4] NCCL INFO [Service thread] Connection closed by localRank 5 +dlc1irjyfb0zt5ew-master-0:80:3222 [6] NCCL INFO [Service thread] Connection closed by localRank 5 +dlc1irjyfb0zt5ew-master-0:76:3223 [2] NCCL INFO [Service thread] Connection closed by localRank 5 +dlc1irjyfb0zt5ew-master-0:74:3225 [0] NCCL INFO [Service thread] Connection closed by localRank 5 +dlc1irjyfb0zt5ew-master-0:76:364 [2] NCCL INFO [Service thread] Connection closed by localRank 5 +dlc1irjyfb0zt5ew-master-0:80:358 [6] NCCL INFO [Service thread] Connection closed by localRank 5 +dlc1irjyfb0zt5ew-master-0:78:363 [4] NCCL INFO [Service thread] Connection closed by localRank 5 +dlc1irjyfb0zt5ew-master-0:74:365 [0] NCCL INFO [Service thread] Connection closed by localRank 5 +dlc1irjyfb0zt5ew-master-0:74:3225 [0] NCCL INFO [Service thread] Connection closed by localRank 4 +dlc1irjyfb0zt5ew-master-0:74:365 [0] NCCL INFO [Service thread] Connection closed by localRank 4 +dlc1irjyfb0zt5ew-master-0:74:3225 [0] NCCL INFO [Service thread] Connection closed by localRank 6 +dlc1irjyfb0zt5ew-master-0:74:365 [0] NCCL INFO [Service thread] Connection closed by localRank 6 +dlc1irjyfb0zt5ew-master-0:74:3225 [0] NCCL INFO [Service thread] Connection closed by localRank 1 +dlc1irjyfb0zt5ew-master-0:74:365 [0] NCCL INFO [Service thread] Connection closed by localRank 1 +dlc1irjyfb0zt5ew-master-0:74:3225 [0] NCCL INFO [Service thread] Connection closed by localRank 2 +dlc1irjyfb0zt5ew-master-0:74:365 [0] NCCL INFO [Service thread] Connection closed by localRank 2 +[INFO|modeling_utils.py:2501] 2025-01-24 01:41:46,678 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 14 checkpoint shards. You can find where each parameters has been saved in the index located at ./checkpoints/llavaAR4-qwen2_5-32b-sft-llavanext-notext-kn-infpolishmd-detail-knins40k-creationme10kfixed-chart11kmerge-tqa8k-info28kgpt/model.safetensors.index.json. +[INFO|tokenization_utils_base.py:2433] 2025-01-24 01:41:46,680 >> tokenizer config file saved in ./checkpoints/llavaAR4-qwen2_5-32b-sft-llavanext-notext-kn-infpolishmd-detail-knins40k-creationme10kfixed-chart11kmerge-tqa8k-info28kgpt/tokenizer_config.json +[INFO|tokenization_utils_base.py:2442] 2025-01-24 01:41:46,681 >> Special tokens file saved in ./checkpoints/llavaAR4-qwen2_5-32b-sft-llavanext-notext-kn-infpolishmd-detail-knins40k-creationme10kfixed-chart11kmerge-tqa8k-info28kgpt/special_tokens_map.json +[INFO|tokenization_utils_base.py:2493] 2025-01-24 01:41:46,681 >> added tokens file saved in ./checkpoints/llavaAR4-qwen2_5-32b-sft-llavanext-notext-kn-infpolishmd-detail-knins40k-creationme10kfixed-chart11kmerge-tqa8k-info28kgpt/added_tokens.json +wandb: 🚀 View run llavaAR4-qwen2_5-32b-sft-llavanext-notext-kn-infpolishmd-detail-knins40k-creationme10kfixed-chart11kmerge-tqa8k-info28kgpt at: https://wandb.ai/openmmlab_zxy/huggingface/runs/x4s5p1ck +wandb: Find logs at: wandb/run-20250123_002254-x4s5p1ck/logs