[2023-12-09 15:26:45,478] torch.distributed.run: [WARNING] master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified. [2023-12-09 15:26:45,478] torch.distributed.run: [WARNING] [2023-12-09 15:26:45,478] torch.distributed.run: [WARNING] ***************************************** [2023-12-09 15:26:45,478] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. [2023-12-09 15:26:45,478] torch.distributed.run: [WARNING] ***************************************** 12/09/2023 15:26:57 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False 12/09/2023 15:26:57 - INFO - __main__ - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=no, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_config=None, generation_max_length=None, generation_num_beams=None, gradient_accumulation_steps=1, gradient_checkpointing=False, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=output/text-20231209-152643-1e-4/runs/Dec09_15-26-54_lily-gpu07, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lr_scheduler_type=linear, max_grad_norm=1.0, max_steps=500, metric_for_best_model=None, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, output_dir=output/text-20231209-152643-1e-4, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=1, predict_with_generate=False, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=[], resume_from_checkpoint=None, run_name=output/text-20231209-152643-1e-4, save_on_each_node=False, save_safetensors=True, save_steps=50, save_strategy=steps, save_total_limit=None, seed=42, skip_memory_metrics=True, sortish_sampler=False, split_batches=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) [INFO|tokenization_utils_base.py:2022] 2023-12-09 15:26:58,100 >> loading file tokenizer.model from cache at /home/haiyue/.cache/huggingface/hub/models--THUDM--chatglm3-6b-base/snapshots/f91a1de587fdc692073367198e65369669a0b49d/tokenizer.model [INFO|tokenization_utils_base.py:2022] 2023-12-09 15:26:58,100 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:2022] 2023-12-09 15:26:58,100 >> loading file special_tokens_map.json from cache at None [INFO|tokenization_utils_base.py:2022] 2023-12-09 15:26:58,100 >> loading file tokenizer_config.json from cache at /home/haiyue/.cache/huggingface/hub/models--THUDM--chatglm3-6b-base/snapshots/f91a1de587fdc692073367198e65369669a0b49d/tokenizer_config.json [INFO|tokenization_utils_base.py:2022] 2023-12-09 15:26:58,100 >> loading file tokenizer.json from cache at None [INFO|configuration_utils.py:717] 2023-12-09 15:26:58,534 >> loading configuration file config.json from cache at /home/haiyue/.cache/huggingface/hub/models--THUDM--chatglm3-6b-base/snapshots/f91a1de587fdc692073367198e65369669a0b49d/config.json 12/09/2023 15:26:58 - WARNING - __main__ - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: False [INFO|configuration_utils.py:717] 2023-12-09 15:26:58,785 >> loading configuration file config.json from cache at /home/haiyue/.cache/huggingface/hub/models--THUDM--chatglm3-6b-base/snapshots/f91a1de587fdc692073367198e65369669a0b49d/config.json [INFO|configuration_utils.py:777] 2023-12-09 15:26:58,786 >> Model config ChatGLMConfig { "_name_or_path": "THUDM/chatglm3-6b-base", "add_bias_linear": false, "add_qkv_bias": true, "apply_query_key_layer_scaling": true, "apply_residual_connection_post_layernorm": false, "architectures": [ "ChatGLMModel" ], "attention_dropout": 0.0, "attention_softmax_in_fp32": true, "auto_map": { "AutoConfig": "THUDM/chatglm3-6b-base--configuration_chatglm.ChatGLMConfig", "AutoModel": "THUDM/chatglm3-6b-base--modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForCausalLM": "THUDM/chatglm3-6b-base--modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForSeq2SeqLM": "THUDM/chatglm3-6b-base--modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForSequenceClassification": "THUDM/chatglm3-6b-base--modeling_chatglm.ChatGLMForSequenceClassification" }, "bias_dropout_fusion": true, "classifier_dropout": null, "eos_token_id": 2, "ffn_hidden_size": 13696, "fp32_residual_connection": false, "hidden_dropout": 0.0, "hidden_size": 4096, "kv_channels": 128, "layernorm_epsilon": 1e-05, "model_type": "chatglm", "multi_query_attention": true, "multi_query_group_num": 2, "num_attention_heads": 32, "num_layers": 28, "original_rope": true, "pad_token_id": 0, "padded_vocab_size": 65024, "post_layer_norm": true, "pre_seq_len": null, "prefix_projection": false, "quantization_bit": 0, "rmsnorm": true, "seq_length": 32768, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.35.2", "use_cache": true, "vocab_size": 65024 } [INFO|modeling_utils.py:3121] 2023-12-09 15:27:00,104 >> loading weights file pytorch_model.bin from cache at /home/haiyue/.cache/huggingface/hub/models--THUDM--chatglm3-6b-base/snapshots/f91a1de587fdc692073367198e65369669a0b49d/pytorch_model.bin.index.json [INFO|configuration_utils.py:791] 2023-12-09 15:27:00,113 >> Generate config GenerationConfig { "eos_token_id": 2, "pad_token_id": 0 } Loading checkpoint shards: 0%| | 0/7 [00:00> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration. [INFO|modeling_utils.py:3958] 2023-12-09 15:27:15,240 >> All the weights of ChatGLMForConditionalGeneration were initialized from the model checkpoint at THUDM/chatglm3-6b-base. If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMForConditionalGeneration for predictions without further training. [INFO|modeling_utils.py:3525] 2023-12-09 15:27:15,493 >> Generation config file not found, using a generation config created from the model config. Loading checkpoint shards: 100%|██████████| 7/7 [00:15<00:00, 2.05s/it] Loading checkpoint shards: 100%|██████████| 7/7 [00:15<00:00, 2.19s/it] Train dataset size: 52002 Sanity Check >>>>>>>>>>>>> '[gMASK]': 64790 -> -100 'sop': 64792 -> -100 'Instruction': 29101 -> -100 ':': 30954 -> -100 'Give': 10465 -> -100 'three': 1194 -> -100 'tips': 6639 -> -100 'for': 332 -> -100 'staying': 10061 -> -100 'healthy': 4651 -> -100 '.': 30930 -> -100 '\n': 13 -> -100 'An': 4244 -> -100 'sw': 1902 -> -100 'er': 266 -> -100 ':': 30954 -> -100 '': 30910 -> -100 '': 30910 -> 30910 '1': 30939 -> 30939 '.': 30930 -> 30930 'E': 30950 -> 30950 'at': 269 -> 269 'a': 260 -> 260 'balanced': 12949 -> 12949 'diet': 5546 -> 5546 'and': 293 -> 293 'make': 794 -> 794 'sure': 1506 -> 1506 'to': 289 -> 289 'include': 1860 -> 1860 'plenty': 5765 -> 5765 'of': 290 -> 290 'fruits': 13665 -> 13665 'and': 293 -> 293 'vegetables': 11567 -> 11567 '.': 30930 -> 30930 '': 30910 -> 30910 '\n': 13 -> 13 '2': 30943 -> 30943 '.': 30930 -> 30930 'Exercise': 23340 -> 23340 'regularly': 7414 -> 7414 'to': 289 -> 289 'keep': 1407 -> 1407 'your': 475 -> 475 'body': 1934 -> 1934 'active': 4047 -> 4047 'and': 293 -> 293 'strong': 2034 -> 2034 '.': 30930 -> 30930 '': 30910 -> 30910 '\n': 13 -> 13 '3': 30966 -> 30966 '.': 30930 -> 30930 'Get': 3286 -> 3286 'enough': 1775 -> 1775 'sleep': 4039 -> 4039 'and': 293 -> 293 'maintain': 3165 -> 3165 'a': 260 -> 260 'consistent': 7096 -> 7096 'sleep': 4039 -> 4039 'schedule': 5821 -> 5821 '.': 30930 -> 30930 '': 2 -> 2 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 <<<<<<<<<<<<< Sanity Check Train dataset size: 52002 Sanity Check >>>>>>>>>>>>> '[gMASK]': 64790 -> -100 'sop': 64792 -> -100 'Instruction': 29101 -> -100 ':': 30954 -> -100 'Give': 10465 -> -100 'three': 1194 -> -100 'tips': 6639 -> -100 'for': 332 -> -100 'staying': 10061 -> -100 'healthy': 4651 -> -100 '.': 30930 -> -100 '\n': 13 -> -100 'An': 4244 -> -100 'sw': 1902 -> -100 'er': 266 -> -100 ':': 30954 -> -100 '': 30910 -> -100 '': 30910 -> 30910 '1': 30939 -> 30939 '.': 30930 -> 30930 'E': 30950 -> 30950 'at': 269 -> 269 'a': 260 -> 260 'balanced': 12949 -> 12949 'diet': 5546 -> 5546 'and': 293 -> 293 'make': 794 -> 794 'sure': 1506 -> 1506 'to': 289 -> 289 'include': 1860 -> 1860 'plenty': 5765 -> 5765 'of': 290 -> 290 'fruits': 13665 -> 13665 'and': 293 -> 293 'vegetables': 11567 -> 11567 '.': 30930 -> 30930 '': 30910 -> 30910 '\n': 13 -> 13 '2': 30943 -> 30943 '.': 30930 -> 30930 'Exercise': 23340 -> 23340 'regularly': 7414 -> 7414 'to': 289 -> 289 'keep': 1407 -> 1407 'your': 475 -> 475 'body': 1934 -> 1934 'active': 4047 -> 4047 'and': 293 -> 293 'strong': 2034 -> 2034 '.': 30930 -> 30930 '': 30910 -> 30910 '\n': 13 -> 13 '3': 30966 -> 30966 '.': 30930 -> 30930 'Get': 3286 -> 3286 'enough': 1775 -> 1775 'sleep': 4039 -> 4039 'and': 293 -> 293 'maintain': 3165 -> 3165 'a': 260 -> 260 'consistent': 7096 -> 7096 'sleep': 4039 -> 4039 'schedule': 5821 -> 5821 '.': 30930 -> 30930 '': 2 -> 2 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 <<<<<<<<<<<<< Sanity Check [INFO|trainer.py:544] 2023-12-09 15:27:20,460 >> max_steps is given, it will override any value given in num_train_epochs [INFO|trainer.py:1723] 2023-12-09 15:27:22,980 >> ***** Running training ***** [INFO|trainer.py:1724] 2023-12-09 15:27:22,981 >> Num examples = 52,002 [INFO|trainer.py:1725] 2023-12-09 15:27:22,981 >> Num Epochs = 1 [INFO|trainer.py:1726] 2023-12-09 15:27:22,981 >> Instantaneous batch size per device = 1 [INFO|trainer.py:1729] 2023-12-09 15:27:22,981 >> Total train batch size (w. parallel, distributed & accumulation) = 2 [INFO|trainer.py:1730] 2023-12-09 15:27:22,981 >> Gradient Accumulation steps = 1 [INFO|trainer.py:1731] 2023-12-09 15:27:22,981 >> Total optimization steps = 500 [INFO|trainer.py:1732] 2023-12-09 15:27:22,983 >> Number of trainable parameters = 1,949,696 0%| | 0/500 [00:00> tokenizer config file saved in output/text-20231209-152643-1e-4/checkpoint-50/tokenizer_config.json [INFO|tokenization_utils_base.py:2437] 2023-12-09 15:27:38,144 >> Special tokens file saved in output/text-20231209-152643-1e-4/checkpoint-50/special_tokens_map.json 10%|█ | 51/500 [00:14<01:51, 4.03it/s] {'loss': 0.0, 'learning_rate': 8.98e-05, 'epoch': 0.0} 10%|█ | 51/500 [00:14<01:51, 4.03it/s] 10%|█ | 52/500 [00:14<01:49, 4.08it/s] {'loss': 0.0, 'learning_rate': 8.960000000000001e-05, 'epoch': 0.0} 10%|█ | 52/500 [00:14<01:49, 4.08it/s] 11%|█ | 53/500 [00:15<01:47, 4.14it/s] {'loss': 0.0, 'learning_rate': 8.94e-05, 'epoch': 0.0} 11%|█ | 53/500 [00:15<01:47, 4.14it/s] 11%|█ | 54/500 [00:15<01:46, 4.20it/s] {'loss': 0.0, 'learning_rate': 8.92e-05, 'epoch': 0.0} 11%|█ | 54/500 [00:15<01:46, 4.20it/s] 11%|█ | 55/500 [00:15<01:45, 4.24it/s] {'loss': 0.0, 'learning_rate': 8.900000000000001e-05, 'epoch': 0.0} 11%|█ | 55/500 [00:15<01:45, 4.24it/s] 11%|█ | 56/500 [00:15<01:43, 4.27it/s] {'loss': 0.0, 'learning_rate': 8.88e-05, 'epoch': 0.0} 11%|█ | 56/500 [00:15<01:43, 4.27it/s] 11%|█▏ | 57/500 [00:16<01:43, 4.29it/s] {'loss': 0.0, 'learning_rate': 8.86e-05, 'epoch': 0.0} 11%|█▏ | 57/500 [00:16<01:43, 4.29it/s] 12%|█▏ | 58/500 [00:16<01:43, 4.27it/s] {'loss': 0.0, 'learning_rate': 8.840000000000001e-05, 'epoch': 0.0} 12%|█▏ | 58/500 [00:16<01:43, 4.27it/s] 12%|█▏ | 59/500 [00:16<01:43, 4.28it/s] {'loss': 0.0, 'learning_rate': 8.82e-05, 'epoch': 0.0} 12%|█▏ | 59/500 [00:16<01:43, 4.28it/s] 12%|█▏ | 60/500 [00:16<01:42, 4.29it/s] {'loss': 0.0, 'learning_rate': 8.800000000000001e-05, 'epoch': 0.0} 12%|█▏ | 60/500 [00:16<01:42, 4.29it/s] 12%|█▏ | 61/500 [00:17<01:41, 4.31it/s] {'loss': 0.0, 'learning_rate': 8.78e-05, 'epoch': 0.0} 12%|█▏ | 61/500 [00:17<01:41, 4.31it/s] 12%|█▏ | 62/500 [00:17<01:41, 4.31it/s] {'loss': 0.0, 'learning_rate': 8.76e-05, 'epoch': 0.0} 12%|█▏ | 62/500 [00:17<01:41, 4.31it/s] 13%|█▎ | 63/500 [00:17<01:41, 4.31it/s] {'loss': 0.0, 'learning_rate': 8.740000000000001e-05, 'epoch': 0.0} 13%|█▎ | 63/500 [00:17<01:41, 4.31it/s] 13%|█▎ | 64/500 [00:17<01:40, 4.32it/s] {'loss': 0.0, 'learning_rate': 8.72e-05, 'epoch': 0.0} 13%|█▎ | 64/500 [00:17<01:40, 4.32it/s] 13%|█▎ | 65/500 [00:17<01:40, 4.32it/s] {'loss': 0.0, 'learning_rate': 8.7e-05, 'epoch': 0.0} 13%|█▎ | 65/500 [00:17<01:40, 4.32it/s] 13%|█▎ | 66/500 [00:18<01:40, 4.32it/s] {'loss': 0.0, 'learning_rate': 8.680000000000001e-05, 'epoch': 0.0} 13%|█▎ | 66/500 [00:18<01:40, 4.32it/s] 13%|█▎ | 67/500 [00:18<01:40, 4.32it/s] {'loss': 0.0, 'learning_rate': 8.66e-05, 'epoch': 0.0} 13%|█▎ | 67/500 [00:18<01:40, 4.32it/s] 14%|█▎ | 68/500 [00:18<01:39, 4.34it/s] {'loss': 0.0, 'learning_rate': 8.64e-05, 'epoch': 0.0} 14%|█▎ | 68/500 [00:18<01:39, 4.34it/s] 14%|█▍ | 69/500 [00:18<01:39, 4.32it/s] {'loss': 0.0, 'learning_rate': 8.620000000000001e-05, 'epoch': 0.0} 14%|█▍ | 69/500 [00:18<01:39, 4.32it/s] 14%|█▍ | 70/500 [00:19<01:39, 4.33it/s] {'loss': 0.0, 'learning_rate': 8.6e-05, 'epoch': 0.0} 14%|█▍ | 70/500 [00:19<01:39, 4.33it/s] 14%|█▍ | 71/500 [00:19<01:38, 4.34it/s] {'loss': 0.0, 'learning_rate': 8.58e-05, 'epoch': 0.0} 14%|█▍ | 71/500 [00:19<01:38, 4.34it/s] 14%|█▍ | 72/500 [00:19<01:38, 4.33it/s] {'loss': 0.0, 'learning_rate': 8.560000000000001e-05, 'epoch': 0.0} 14%|█▍ | 72/500 [00:19<01:38, 4.33it/s] 15%|█▍ | 73/500 [00:19<01:38, 4.35it/s] {'loss': 0.0, 'learning_rate': 8.54e-05, 'epoch': 0.0} 15%|█▍ | 73/500 [00:19<01:38, 4.35it/s] 15%|█▍ | 74/500 [00:20<01:38, 4.33it/s] {'loss': 0.0, 'learning_rate': 8.52e-05, 'epoch': 0.0} 15%|█▍ | 74/500 [00:20<01:38, 4.33it/s] 15%|█▌ | 75/500 [00:20<01:38, 4.32it/s] {'loss': 0.0, 'learning_rate': 8.5e-05, 'epoch': 0.0} 15%|█▌ | 75/500 [00:20<01:38, 4.32it/s] 15%|█▌ | 76/500 [00:20<01:37, 4.34it/s] {'loss': 0.0, 'learning_rate': 8.48e-05, 'epoch': 0.0} 15%|█▌ | 76/500 [00:20<01:37, 4.34it/s] 15%|█▌ | 77/500 [00:20<01:37, 4.33it/s] {'loss': 0.0, 'learning_rate': 8.46e-05, 'epoch': 0.0} 15%|█▌ | 77/500 [00:20<01:37, 4.33it/s] 16%|█▌ | 78/500 [00:20<01:37, 4.33it/s] {'loss': 0.0, 'learning_rate': 8.44e-05, 'epoch': 0.0} 16%|█▌ | 78/500 [00:20<01:37, 4.33it/s] 16%|█▌ | 79/500 [00:21<01:37, 4.33it/s] {'loss': 0.0, 'learning_rate': 8.42e-05, 'epoch': 0.0} 16%|█▌ | 79/500 [00:21<01:37, 4.33it/s] 16%|█▌ | 80/500 [00:21<01:36, 4.33it/s] {'loss': 0.0, 'learning_rate': 8.4e-05, 'epoch': 0.0} 16%|█▌ | 80/500 [00:21<01:36, 4.33it/s] 16%|█▌ | 81/500 [00:21<01:36, 4.32it/s] {'loss': 0.0, 'learning_rate': 8.38e-05, 'epoch': 0.0} 16%|█▌ | 81/500 [00:21<01:36, 4.32it/s] 16%|█▋ | 82/500 [00:21<01:37, 4.30it/s] {'loss': 0.0, 'learning_rate': 8.36e-05, 'epoch': 0.0} 16%|█▋ | 82/500 [00:21<01:37, 4.30it/s] 17%|█▋ | 83/500 [00:22<01:37, 4.29it/s] {'loss': 0.0, 'learning_rate': 8.34e-05, 'epoch': 0.0} 17%|█▋ | 83/500 [00:22<01:37, 4.29it/s] 17%|█▋ | 84/500 [00:22<01:36, 4.29it/s] {'loss': 0.0, 'learning_rate': 8.32e-05, 'epoch': 0.0} 17%|█▋ | 84/500 [00:22<01:36, 4.29it/s] 17%|█▋ | 85/500 [00:22<01:36, 4.30it/s] {'loss': 0.0, 'learning_rate': 8.3e-05, 'epoch': 0.0} 17%|█▋ | 85/500 [00:22<01:36, 4.30it/s] 17%|█▋ | 86/500 [00:22<01:36, 4.30it/s] {'loss': 0.0, 'learning_rate': 8.28e-05, 'epoch': 0.0} 17%|█▋ | 86/500 [00:22<01:36, 4.30it/s] 17%|█▋ | 87/500 [00:23<01:36, 4.27it/s] {'loss': 0.0, 'learning_rate': 8.26e-05, 'epoch': 0.0} 17%|█▋ | 87/500 [00:23<01:36, 4.27it/s] 18%|█▊ | 88/500 [00:23<01:36, 4.29it/s] {'loss': 0.0, 'learning_rate': 8.24e-05, 'epoch': 0.0} 18%|█▊ | 88/500 [00:23<01:36, 4.29it/s] 18%|█▊ | 89/500 [00:23<01:35, 4.31it/s] {'loss': 0.0, 'learning_rate': 8.22e-05, 'epoch': 0.0} 18%|█▊ | 89/500 [00:23<01:35, 4.31it/s] 18%|█▊ | 90/500 [00:23<01:35, 4.31it/s] {'loss': 0.0, 'learning_rate': 8.2e-05, 'epoch': 0.0} 18%|█▊ | 90/500 [00:23<01:35, 4.31it/s] 18%|█▊ | 91/500 [00:23<01:34, 4.32it/s] {'loss': 0.0, 'learning_rate': 8.18e-05, 'epoch': 0.0} 18%|█▊ | 91/500 [00:23<01:34, 4.32it/s] 18%|█▊ | 92/500 [00:24<01:34, 4.32it/s] {'loss': 0.0, 'learning_rate': 8.16e-05, 'epoch': 0.0} 18%|█▊ | 92/500 [00:24<01:34, 4.32it/s] 19%|█▊ | 93/500 [00:24<01:34, 4.30it/s] {'loss': 0.0, 'learning_rate': 8.14e-05, 'epoch': 0.0} 19%|█▊ | 93/500 [00:24<01:34, 4.30it/s] 19%|█▉ | 94/500 [00:24<01:34, 4.28it/s] {'loss': 0.0, 'learning_rate': 8.120000000000001e-05, 'epoch': 0.0} 19%|█▉ | 94/500 [00:24<01:34, 4.28it/s] 19%|█▉ | 95/500 [00:24<01:34, 4.29it/s] {'loss': 0.0, 'learning_rate': 8.1e-05, 'epoch': 0.0} 19%|█▉ | 95/500 [00:24<01:34, 4.29it/s] 19%|█▉ | 96/500 [00:25<01:33, 4.30it/s] {'loss': 0.0, 'learning_rate': 8.080000000000001e-05, 'epoch': 0.0} 19%|█▉ | 96/500 [00:25<01:33, 4.30it/s] 19%|█▉ | 97/500 [00:25<01:33, 4.33it/s] {'loss': 0.0, 'learning_rate': 8.060000000000001e-05, 'epoch': 0.0} 19%|█▉ | 97/500 [00:25<01:33, 4.33it/s] 20%|█▉ | 98/500 [00:25<01:32, 4.33it/s] {'loss': 0.0, 'learning_rate': 8.04e-05, 'epoch': 0.0} 20%|█▉ | 98/500 [00:25<01:32, 4.33it/s] 20%|█▉ | 99/500 [00:25<01:32, 4.33it/s] {'loss': 0.0, 'learning_rate': 8.020000000000001e-05, 'epoch': 0.0} 20%|█▉ | 99/500 [00:25<01:32, 4.33it/s] 20%|██ | 100/500 [00:26<01:32, 4.34it/s] {'loss': 0.0, 'learning_rate': 8e-05, 'epoch': 0.0} 20%|██ | 100/500 [00:26<01:32, 4.34it/s][INFO|tokenization_utils_base.py:2428] 2023-12-09 15:27:49,786 >> tokenizer config file saved in output/text-20231209-152643-1e-4/checkpoint-100/tokenizer_config.json [INFO|tokenization_utils_base.py:2437] 2023-12-09 15:27:49,786 >> Special tokens file saved in output/text-20231209-152643-1e-4/checkpoint-100/special_tokens_map.json 20%|██ | 101/500 [00:26<01:37, 4.08it/s] {'loss': 0.0, 'learning_rate': 7.98e-05, 'epoch': 0.0} 20%|██ | 101/500 [00:26<01:37, 4.08it/s] 20%|██ | 102/500 [00:26<01:35, 4.17it/s] {'loss': 0.0, 'learning_rate': 7.960000000000001e-05, 'epoch': 0.0} 20%|██ | 102/500 [00:26<01:35, 4.17it/s] 21%|██ | 103/500 [00:26<01:33, 4.23it/s] {'loss': 0.0, 'learning_rate': 7.94e-05, 'epoch': 0.0} 21%|██ | 103/500 [00:26<01:33, 4.23it/s] 21%|██ | 104/500 [00:27<01:33, 4.23it/s] {'loss': 0.0, 'learning_rate': 7.920000000000001e-05, 'epoch': 0.0} 21%|██ | 104/500 [00:27<01:33, 4.23it/s] 21%|██ | 105/500 [00:27<01:32, 4.25it/s] {'loss': 0.0, 'learning_rate': 7.900000000000001e-05, 'epoch': 0.0} 21%|██ | 105/500 [00:27<01:32, 4.25it/s] 21%|██ | 106/500 [00:27<01:31, 4.29it/s] {'loss': 0.0, 'learning_rate': 7.88e-05, 'epoch': 0.0} 21%|██ | 106/500 [00:27<01:31, 4.29it/s] 21%|██▏ | 107/500 [00:27<01:31, 4.30it/s] {'loss': 0.0, 'learning_rate': 7.860000000000001e-05, 'epoch': 0.0} 21%|██▏ | 107/500 [00:27<01:31, 4.30it/s] 22%|██▏ | 108/500 [00:27<01:30, 4.31it/s] {'loss': 0.0, 'learning_rate': 7.840000000000001e-05, 'epoch': 0.0} 22%|██▏ | 108/500 [00:27<01:30, 4.31it/s] 22%|██▏ | 109/500 [00:28<01:31, 4.29it/s] {'loss': 0.0, 'learning_rate': 7.82e-05, 'epoch': 0.0} 22%|██▏ | 109/500 [00:28<01:31, 4.29it/s] 22%|██▏ | 110/500 [00:28<01:31, 4.28it/s] {'loss': 0.0, 'learning_rate': 7.800000000000001e-05, 'epoch': 0.0} 22%|██▏ | 110/500 [00:28<01:31, 4.28it/s] 22%|██▏ | 111/500 [00:28<01:31, 4.27it/s] {'loss': 0.0, 'learning_rate': 7.780000000000001e-05, 'epoch': 0.0} 22%|██▏ | 111/500 [00:28<01:31, 4.27it/s] 22%|██▏ | 112/500 [00:28<01:30, 4.30it/s] {'loss': 0.0, 'learning_rate': 7.76e-05, 'epoch': 0.0} 22%|██▏ | 112/500 [00:28<01:30, 4.30it/s] 23%|██▎ | 113/500 [00:29<01:29, 4.31it/s] {'loss': 0.0, 'learning_rate': 7.740000000000001e-05, 'epoch': 0.0} 23%|██▎ | 113/500 [00:29<01:29, 4.31it/s] 23%|██▎ | 114/500 [00:29<01:29, 4.31it/s] {'loss': 0.0, 'learning_rate': 7.72e-05, 'epoch': 0.0} 23%|██▎ | 114/500 [00:29<01:29, 4.31it/s] 23%|██▎ | 115/500 [00:29<01:29, 4.31it/s] {'loss': 0.0, 'learning_rate': 7.7e-05, 'epoch': 0.0} 23%|██▎ | 115/500 [00:29<01:29, 4.31it/s] 23%|██▎ | 116/500 [00:29<01:29, 4.31it/s] {'loss': 0.0, 'learning_rate': 7.680000000000001e-05, 'epoch': 0.0} 23%|██▎ | 116/500 [00:29<01:29, 4.31it/s] 23%|██▎ | 117/500 [00:30<01:28, 4.32it/s] {'loss': 0.0, 'learning_rate': 7.66e-05, 'epoch': 0.0} 23%|██▎ | 117/500 [00:30<01:28, 4.32it/s] 24%|██▎ | 118/500 [00:30<01:28, 4.31it/s] {'loss': 0.0, 'learning_rate': 7.64e-05, 'epoch': 0.0} 24%|██▎ | 118/500 [00:30<01:28, 4.31it/s] 24%|██▍ | 119/500 [00:30<01:28, 4.32it/s] {'loss': 0.0, 'learning_rate': 7.620000000000001e-05, 'epoch': 0.0} 24%|██▍ | 119/500 [00:30<01:28, 4.32it/s] 24%|██▍ | 120/500 [00:30<01:27, 4.33it/s] {'loss': 0.0, 'learning_rate': 7.6e-05, 'epoch': 0.0} 24%|██▍ | 120/500 [00:30<01:27, 4.33it/s] 24%|██▍ | 121/500 [00:30<01:27, 4.32it/s] {'loss': 0.0, 'learning_rate': 7.58e-05, 'epoch': 0.0} 24%|██▍ | 121/500 [00:30<01:27, 4.32it/s] 24%|██▍ | 122/500 [00:31<01:27, 4.34it/s] {'loss': 0.0, 'learning_rate': 7.560000000000001e-05, 'epoch': 0.0} 24%|██▍ | 122/500 [00:31<01:27, 4.34it/s] 25%|██▍ | 123/500 [00:31<01:27, 4.29it/s] {'loss': 0.0, 'learning_rate': 7.54e-05, 'epoch': 0.0} 25%|██▍ | 123/500 [00:31<01:27, 4.29it/s] 25%|██▍ | 124/500 [00:31<01:28, 4.26it/s] {'loss': 0.0, 'learning_rate': 7.52e-05, 'epoch': 0.0} 25%|██▍ | 124/500 [00:31<01:28, 4.26it/s] 25%|██▌ | 125/500 [00:31<01:28, 4.26it/s] {'loss': 0.0, 'learning_rate': 7.500000000000001e-05, 'epoch': 0.0} 25%|██▌ | 125/500 [00:31<01:28, 4.26it/s] 25%|██▌ | 126/500 [00:32<01:27, 4.26it/s] {'loss': 0.0, 'learning_rate': 7.48e-05, 'epoch': 0.0} 25%|██▌ | 126/500 [00:32<01:27, 4.26it/s] 25%|██▌ | 127/500 [00:32<01:27, 4.28it/s] {'loss': 0.0, 'learning_rate': 7.46e-05, 'epoch': 0.0} 25%|██▌ | 127/500 [00:32<01:27, 4.28it/s] 26%|██▌ | 128/500 [00:32<01:26, 4.30it/s] {'loss': 0.0, 'learning_rate': 7.44e-05, 'epoch': 0.0} 26%|██▌ | 128/500 [00:32<01:26, 4.30it/s] 26%|██▌ | 129/500 [00:32<01:26, 4.29it/s] {'loss': 0.0, 'learning_rate': 7.42e-05, 'epoch': 0.0} 26%|██▌ | 129/500 [00:32<01:26, 4.29it/s] 26%|██▌ | 130/500 [00:33<01:25, 4.30it/s] {'loss': 0.0, 'learning_rate': 7.4e-05, 'epoch': 0.0} 26%|██▌ | 130/500 [00:33<01:25, 4.30it/s] 26%|██▌ | 131/500 [00:33<01:26, 4.28it/s] {'loss': 0.0, 'learning_rate': 7.38e-05, 'epoch': 0.01} 26%|██▌ | 131/500 [00:33<01:26, 4.28it/s] 26%|██▋ | 132/500 [00:33<01:26, 4.27it/s] {'loss': 0.0, 'learning_rate': 7.36e-05, 'epoch': 0.01} 26%|██▋ | 132/500 [00:33<01:26, 4.27it/s] 27%|██▋ | 133/500 [00:33<01:26, 4.26it/s] {'loss': 0.0, 'learning_rate': 7.340000000000001e-05, 'epoch': 0.01} 27%|██▋ | 133/500 [00:33<01:26, 4.26it/s] 27%|██▋ | 134/500 [00:33<01:26, 4.25it/s] {'loss': 0.0, 'learning_rate': 7.32e-05, 'epoch': 0.01} 27%|██▋ | 134/500 [00:34<01:26, 4.25it/s] 27%|██▋ | 135/500 [00:34<01:25, 4.25it/s] {'loss': 0.0, 'learning_rate': 7.3e-05, 'epoch': 0.01} 27%|██▋ | 135/500 [00:34<01:25, 4.25it/s] 27%|██▋ | 136/500 [00:34<01:25, 4.24it/s] {'loss': 0.0, 'learning_rate': 7.280000000000001e-05, 'epoch': 0.01} 27%|██▋ | 136/500 [00:34<01:25, 4.24it/s] 27%|██▋ | 137/500 [00:34<01:25, 4.23it/s] {'loss': 0.0, 'learning_rate': 7.26e-05, 'epoch': 0.01} 27%|██▋ | 137/500 [00:34<01:25, 4.23it/s] 28%|██▊ | 138/500 [00:34<01:25, 4.25it/s] {'loss': 0.0, 'learning_rate': 7.24e-05, 'epoch': 0.01} 28%|██▊ | 138/500 [00:34<01:25, 4.25it/s] 28%|██▊ | 139/500 [00:35<01:24, 4.25it/s] {'loss': 0.0, 'learning_rate': 7.22e-05, 'epoch': 0.01} 28%|██▊ | 139/500 [00:35<01:24, 4.25it/s] 28%|██▊ | 140/500 [00:35<01:24, 4.24it/s] {'loss': 0.0, 'learning_rate': 7.2e-05, 'epoch': 0.01} 28%|██▊ | 140/500 [00:35<01:24, 4.24it/s] 28%|██▊ | 141/500 [00:35<01:24, 4.23it/s] {'loss': 0.0, 'learning_rate': 7.18e-05, 'epoch': 0.01} 28%|██▊ | 141/500 [00:35<01:24, 4.23it/s] 28%|██▊ | 142/500 [00:35<01:24, 4.22it/s] {'loss': 0.0, 'learning_rate': 7.16e-05, 'epoch': 0.01} 28%|██▊ | 142/500 [00:35<01:24, 4.22it/s] 29%|██▊ | 143/500 [00:36<01:24, 4.24it/s] {'loss': 0.0, 'learning_rate': 7.14e-05, 'epoch': 0.01} 29%|██▊ | 143/500 [00:36<01:24, 4.24it/s] 29%|██▉ | 144/500 [00:36<01:23, 4.26it/s] {'loss': 0.0, 'learning_rate': 7.12e-05, 'epoch': 0.01} 29%|██▉ | 144/500 [00:36<01:23, 4.26it/s] 29%|██▉ | 145/500 [00:36<01:23, 4.26it/s] {'loss': 0.0, 'learning_rate': 7.1e-05, 'epoch': 0.01} 29%|██▉ | 145/500 [00:36<01:23, 4.26it/s] 29%|██▉ | 146/500 [00:36<01:22, 4.28it/s] {'loss': 0.0, 'learning_rate': 7.08e-05, 'epoch': 0.01} 29%|██▉ | 146/500 [00:36<01:22, 4.28it/s] 29%|██▉ | 147/500 [00:37<01:22, 4.25it/s] {'loss': 0.0, 'learning_rate': 7.06e-05, 'epoch': 0.01} 29%|██▉ | 147/500 [00:37<01:22, 4.25it/s] 30%|██▉ | 148/500 [00:37<01:22, 4.28it/s] {'loss': 0.0, 'learning_rate': 7.04e-05, 'epoch': 0.01} 30%|██▉ | 148/500 [00:37<01:22, 4.28it/s] 30%|██▉ | 149/500 [00:37<01:21, 4.28it/s] {'loss': 0.0, 'learning_rate': 7.02e-05, 'epoch': 0.01} 30%|██▉ | 149/500 [00:37<01:21, 4.28it/s] 30%|███ | 150/500 [00:37<01:21, 4.27it/s] {'loss': 0.0, 'learning_rate': 7e-05, 'epoch': 0.01} 30%|███ | 150/500 [00:37<01:21, 4.27it/s][INFO|tokenization_utils_base.py:2428] 2023-12-09 15:28:01,507 >> tokenizer config file saved in output/text-20231209-152643-1e-4/checkpoint-150/tokenizer_config.json [INFO|tokenization_utils_base.py:2437] 2023-12-09 15:28:01,507 >> Special tokens file saved in output/text-20231209-152643-1e-4/checkpoint-150/special_tokens_map.json 30%|███ | 151/500 [00:38<01:26, 4.04it/s] {'loss': 0.0, 'learning_rate': 6.98e-05, 'epoch': 0.01} 30%|███ | 151/500 [00:38<01:26, 4.04it/s] 30%|███ | 152/500 [00:38<01:24, 4.13it/s] {'loss': 0.0, 'learning_rate': 6.96e-05, 'epoch': 0.01} 30%|███ | 152/500 [00:38<01:24, 4.13it/s] 31%|███ | 153/500 [00:38<01:23, 4.16it/s] {'loss': 0.0, 'learning_rate': 6.939999999999999e-05, 'epoch': 0.01} 31%|███ | 153/500 [00:38<01:23, 4.16it/s] 31%|███ | 154/500 [00:38<01:22, 4.21it/s] {'loss': 0.0, 'learning_rate': 6.92e-05, 'epoch': 0.01} 31%|███ | 154/500 [00:38<01:22, 4.21it/s] 31%|███ | 155/500 [00:38<01:21, 4.24it/s] {'loss': 0.0, 'learning_rate': 6.9e-05, 'epoch': 0.01} 31%|███ | 155/500 [00:38<01:21, 4.24it/s] 31%|███ | 156/500 [00:39<01:21, 4.24it/s] {'loss': 0.0, 'learning_rate': 6.879999999999999e-05, 'epoch': 0.01} 31%|███ | 156/500 [00:39<01:21, 4.24it/s] 31%|███▏ | 157/500 [00:39<01:20, 4.25it/s] {'loss': 0.0, 'learning_rate': 6.860000000000001e-05, 'epoch': 0.01} 31%|███▏ | 157/500 [00:39<01:20, 4.25it/s] 32%|███▏ | 158/500 [00:39<01:20, 4.26it/s] {'loss': 0.0, 'learning_rate': 6.840000000000001e-05, 'epoch': 0.01} 32%|███▏ | 158/500 [00:39<01:20, 4.26it/s] 32%|███▏ | 159/500 [00:39<01:19, 4.29it/s] {'loss': 0.0, 'learning_rate': 6.82e-05, 'epoch': 0.01} 32%|███▏ | 159/500 [00:39<01:19, 4.29it/s] 32%|███▏ | 160/500 [00:40<01:19, 4.27it/s] {'loss': 0.0, 'learning_rate': 6.800000000000001e-05, 'epoch': 0.01} 32%|███▏ | 160/500 [00:40<01:19, 4.27it/s] 32%|███▏ | 161/500 [00:40<01:19, 4.29it/s] {'loss': 0.0, 'learning_rate': 6.780000000000001e-05, 'epoch': 0.01} 32%|███▏ | 161/500 [00:40<01:19, 4.29it/s] 32%|███▏ | 162/500 [00:40<01:18, 4.30it/s] {'loss': 0.0, 'learning_rate': 6.76e-05, 'epoch': 0.01} 32%|███▏ | 162/500 [00:40<01:18, 4.30it/s] 33%|███▎ | 163/500 [00:40<01:18, 4.28it/s] {'loss': 0.0, 'learning_rate': 6.740000000000001e-05, 'epoch': 0.01} 33%|███▎ | 163/500 [00:40<01:18, 4.28it/s] 33%|███▎ | 164/500 [00:41<01:18, 4.29it/s] {'loss': 0.0, 'learning_rate': 6.720000000000001e-05, 'epoch': 0.01} 33%|███▎ | 164/500 [00:41<01:18, 4.29it/s] 33%|███▎ | 165/500 [00:41<01:18, 4.27it/s] {'loss': 0.0, 'learning_rate': 6.7e-05, 'epoch': 0.01} 33%|███▎ | 165/500 [00:41<01:18, 4.27it/s] 33%|███▎ | 166/500 [00:41<01:18, 4.26it/s] {'loss': 0.0, 'learning_rate': 6.680000000000001e-05, 'epoch': 0.01} 33%|███▎ | 166/500 [00:41<01:18, 4.26it/s] 33%|███▎ | 167/500 [00:41<01:18, 4.26it/s] {'loss': 0.0, 'learning_rate': 6.66e-05, 'epoch': 0.01} 33%|███▎ | 167/500 [00:41<01:18, 4.26it/s] 34%|███▎ | 168/500 [00:42<01:18, 4.23it/s] {'loss': 0.0, 'learning_rate': 6.64e-05, 'epoch': 0.01} 34%|███▎ | 168/500 [00:42<01:18, 4.23it/s] 34%|███▍ | 169/500 [00:42<01:18, 4.22it/s] {'loss': 0.0, 'learning_rate': 6.620000000000001e-05, 'epoch': 0.01} 34%|███▍ | 169/500 [00:42<01:18, 4.22it/s] 34%|███▍ | 170/500 [00:42<01:19, 4.16it/s] {'loss': 0.0, 'learning_rate': 6.6e-05, 'epoch': 0.01} 34%|███▍ | 170/500 [00:42<01:19, 4.16it/s] 34%|███▍ | 171/500 [00:42<01:31, 3.60it/s] {'loss': 0.0, 'learning_rate': 6.58e-05, 'epoch': 0.01} 34%|███▍ | 171/500 [00:42<01:31, 3.60it/s] 34%|███▍ | 172/500 [00:43<01:36, 3.39it/s] {'loss': 0.0, 'learning_rate': 6.560000000000001e-05, 'epoch': 0.01} 34%|███▍ | 172/500 [00:43<01:36, 3.39it/s] 35%|███▍ | 173/500 [00:43<01:32, 3.53it/s] {'loss': 0.0, 'learning_rate': 6.54e-05, 'epoch': 0.01} 35%|███▍ | 173/500 [00:43<01:32, 3.53it/s] 35%|███▍ | 174/500 [00:43<01:27, 3.72it/s] {'loss': 0.0, 'learning_rate': 6.52e-05, 'epoch': 0.01} 35%|███▍ | 174/500 [00:43<01:27, 3.72it/s] 35%|███▌ | 175/500 [00:43<01:24, 3.86it/s] {'loss': 0.0, 'learning_rate': 6.500000000000001e-05, 'epoch': 0.01} 35%|███▌ | 175/500 [00:43<01:24, 3.86it/s] 35%|███▌ | 176/500 [00:44<01:21, 3.96it/s] {'loss': 0.0, 'learning_rate': 6.48e-05, 'epoch': 0.01} 35%|███▌ | 176/500 [00:44<01:21, 3.96it/s] 35%|███▌ | 177/500 [00:44<01:19, 4.05it/s] {'loss': 0.0, 'learning_rate': 6.460000000000001e-05, 'epoch': 0.01} 35%|███▌ | 177/500 [00:44<01:19, 4.05it/s] 36%|███▌ | 178/500 [00:44<01:17, 4.14it/s] {'loss': 0.0, 'learning_rate': 6.440000000000001e-05, 'epoch': 0.01} 36%|███▌ | 178/500 [00:44<01:17, 4.14it/s] 36%|███▌ | 179/500 [00:44<01:16, 4.19it/s] {'loss': 0.0, 'learning_rate': 6.42e-05, 'epoch': 0.01} 36%|███▌ | 179/500 [00:44<01:16, 4.19it/s] 36%|███▌ | 180/500 [00:45<01:16, 4.20it/s] {'loss': 0.0, 'learning_rate': 6.400000000000001e-05, 'epoch': 0.01} 36%|███▌ | 180/500 [00:45<01:16, 4.20it/s] 36%|███▌ | 181/500 [00:45<01:15, 4.23it/s] {'loss': 0.0, 'learning_rate': 6.38e-05, 'epoch': 0.01} 36%|███▌ | 181/500 [00:45<01:15, 4.23it/s] 36%|███▋ | 182/500 [00:45<01:14, 4.27it/s] {'loss': 0.0, 'learning_rate': 6.36e-05, 'epoch': 0.01} 36%|███▋ | 182/500 [00:45<01:14, 4.27it/s] 37%|███▋ | 183/500 [00:45<01:14, 4.27it/s] {'loss': 0.0, 'learning_rate': 6.340000000000001e-05, 'epoch': 0.01} 37%|███▋ | 183/500 [00:45<01:14, 4.27it/s] 37%|███▋ | 184/500 [00:46<01:14, 4.27it/s] {'loss': 0.0, 'learning_rate': 6.32e-05, 'epoch': 0.01} 37%|███▋ | 184/500 [00:46<01:14, 4.27it/s] 37%|███▋ | 185/500 [00:46<01:13, 4.27it/s] {'loss': 0.0, 'learning_rate': 6.3e-05, 'epoch': 0.01} 37%|███▋ | 185/500 [00:46<01:13, 4.27it/s] 37%|███▋ | 186/500 [00:46<01:13, 4.25it/s] {'loss': 0.0, 'learning_rate': 6.280000000000001e-05, 'epoch': 0.01} 37%|███▋ | 186/500 [00:46<01:13, 4.25it/s] 37%|███▋ | 187/500 [00:46<01:13, 4.26it/s] {'loss': 0.0, 'learning_rate': 6.26e-05, 'epoch': 0.01} 37%|███▋ | 187/500 [00:46<01:13, 4.26it/s] 38%|███▊ | 188/500 [00:46<01:13, 4.27it/s] {'loss': 0.0, 'learning_rate': 6.24e-05, 'epoch': 0.01} 38%|███▊ | 188/500 [00:46<01:13, 4.27it/s] 38%|███▊ | 189/500 [00:47<01:12, 4.28it/s] {'loss': 0.0, 'learning_rate': 6.220000000000001e-05, 'epoch': 0.01} 38%|███▊ | 189/500 [00:47<01:12, 4.28it/s] 38%|███▊ | 190/500 [00:47<01:12, 4.29it/s] {'loss': 0.0, 'learning_rate': 6.2e-05, 'epoch': 0.01} 38%|███▊ | 190/500 [00:47<01:12, 4.29it/s] 38%|███▊ | 191/500 [00:47<01:12, 4.28it/s] {'loss': 0.0, 'learning_rate': 6.18e-05, 'epoch': 0.01} 38%|███▊ | 191/500 [00:47<01:12, 4.28it/s] 38%|███▊ | 192/500 [00:47<01:11, 4.29it/s] {'loss': 0.0, 'learning_rate': 6.16e-05, 'epoch': 0.01} 38%|███▊ | 192/500 [00:47<01:11, 4.29it/s] 39%|███▊ | 193/500 [00:48<01:11, 4.28it/s] {'loss': 0.0, 'learning_rate': 6.14e-05, 'epoch': 0.01} 39%|███▊ | 193/500 [00:48<01:11, 4.28it/s] 39%|███▉ | 194/500 [00:48<01:11, 4.30it/s] {'loss': 0.0, 'learning_rate': 6.12e-05, 'epoch': 0.01} 39%|███▉ | 194/500 [00:48<01:11, 4.30it/s] 39%|███▉ | 195/500 [00:48<01:10, 4.31it/s] {'loss': 0.0, 'learning_rate': 6.1e-05, 'epoch': 0.01} 39%|███▉ | 195/500 [00:48<01:10, 4.31it/s] 39%|███▉ | 196/500 [00:48<01:17, 3.93it/s] {'loss': 0.0, 'learning_rate': 6.08e-05, 'epoch': 0.01} 39%|███▉ | 196/500 [00:48<01:17, 3.93it/s] 39%|███▉ | 197/500 [00:49<01:20, 3.79it/s] {'loss': 0.0, 'learning_rate': 6.06e-05, 'epoch': 0.01} 39%|███▉ | 197/500 [00:49<01:20, 3.79it/s] 40%|███▉ | 198/500 [00:49<01:43, 2.92it/s] {'loss': 0.0, 'learning_rate': 6.04e-05, 'epoch': 0.01} 40%|███▉ | 198/500 [00:49<01:43, 2.92it/s] 40%|███▉ | 199/500 [00:50<01:56, 2.58it/s] {'loss': 0.0, 'learning_rate': 6.02e-05, 'epoch': 0.01} 40%|███▉ | 199/500 [00:50<01:56, 2.58it/s] 40%|████ | 200/500 [00:50<01:52, 2.67it/s] {'loss': 0.0, 'learning_rate': 6e-05, 'epoch': 0.01} 40%|████ | 200/500 [00:50<01:52, 2.67it/s][INFO|tokenization_utils_base.py:2428] 2023-12-09 15:28:14,303 >> tokenizer config file saved in output/text-20231209-152643-1e-4/checkpoint-200/tokenizer_config.json [INFO|tokenization_utils_base.py:2437] 2023-12-09 15:28:14,304 >> Special tokens file saved in output/text-20231209-152643-1e-4/checkpoint-200/special_tokens_map.json 40%|████ | 201/500 [00:50<01:52, 2.65it/s] {'loss': 0.0, 'learning_rate': 5.9800000000000003e-05, 'epoch': 0.01} 40%|████ | 201/500 [00:50<01:52, 2.65it/s] 40%|████ | 202/500 [00:51<01:46, 2.79it/s] {'loss': 0.0, 'learning_rate': 5.96e-05, 'epoch': 0.01} 40%|████ | 202/500 [00:51<01:46, 2.79it/s] 41%|████ | 203/500 [00:51<01:51, 2.67it/s] {'loss': 0.0, 'learning_rate': 5.94e-05, 'epoch': 0.01} 41%|████ | 203/500 [00:51<01:51, 2.67it/s] 41%|████ | 204/500 [00:51<01:48, 2.74it/s] {'loss': 0.0, 'learning_rate': 5.92e-05, 'epoch': 0.01} 41%|████ | 204/500 [00:52<01:48, 2.74it/s] 41%|████ | 205/500 [00:52<01:39, 2.97it/s] {'loss': 0.0, 'learning_rate': 5.9e-05, 'epoch': 0.01} 41%|████ | 205/500 [00:52<01:39, 2.97it/s] 41%|████ | 206/500 [00:52<01:42, 2.88it/s] {'loss': 0.0, 'learning_rate': 5.88e-05, 'epoch': 0.01} 41%|████ | 206/500 [00:52<01:42, 2.88it/s] 41%|████▏ | 207/500 [00:52<01:33, 3.14it/s] {'loss': 0.0, 'learning_rate': 5.86e-05, 'epoch': 0.01} 41%|████▏ | 207/500 [00:52<01:33, 3.14it/s] 42%|████▏ | 208/500 [00:53<01:31, 3.19it/s] {'loss': 0.0, 'learning_rate': 5.8399999999999997e-05, 'epoch': 0.01} 42%|████▏ | 208/500 [00:53<01:31, 3.19it/s] 42%|████▏ | 209/500 [00:53<01:30, 3.20it/s] {'loss': 0.0, 'learning_rate': 5.82e-05, 'epoch': 0.01} 42%|████▏ | 209/500 [00:53<01:30, 3.20it/s] 42%|████▏ | 210/500 [00:53<01:35, 3.03it/s] {'loss': 0.0, 'learning_rate': 5.8e-05, 'epoch': 0.01} 42%|████▏ | 210/500 [00:53<01:35, 3.03it/s] 42%|████▏ | 211/500 [00:54<01:33, 3.09it/s] {'loss': 0.0, 'learning_rate': 5.7799999999999995e-05, 'epoch': 0.01} 42%|████▏ | 211/500 [00:54<01:33, 3.09it/s] 42%|████▏ | 212/500 [00:54<01:38, 2.91it/s] {'loss': 0.0, 'learning_rate': 5.76e-05, 'epoch': 0.01} 42%|████▏ | 212/500 [00:54<01:38, 2.91it/s] 43%|████▎ | 213/500 [00:54<01:42, 2.79it/s] {'loss': 0.0, 'learning_rate': 5.74e-05, 'epoch': 0.01} 43%|████▎ | 213/500 [00:54<01:42, 2.79it/s] 43%|████▎ | 214/500 [00:55<01:38, 2.91it/s] {'loss': 0.0, 'learning_rate': 5.72e-05, 'epoch': 0.01} 43%|████▎ | 214/500 [00:55<01:38, 2.91it/s] 43%|████▎ | 215/500 [00:55<01:37, 2.91it/s] {'loss': 0.0, 'learning_rate': 5.6999999999999996e-05, 'epoch': 0.01} 43%|████▎ | 215/500 [00:55<01:37, 2.91it/s] 43%|████▎ | 216/500 [00:55<01:40, 2.84it/s] {'loss': 0.0, 'learning_rate': 5.68e-05, 'epoch': 0.01} 43%|████▎ | 216/500 [00:55<01:40, 2.84it/s] 43%|████▎ | 217/500 [00:56<01:31, 3.11it/s] {'loss': 0.0, 'learning_rate': 5.66e-05, 'epoch': 0.01} 43%|████▎ | 217/500 [00:56<01:31, 3.11it/s] 44%|████▎ | 218/500 [00:56<01:30, 3.12it/s] {'loss': 0.0, 'learning_rate': 5.6399999999999995e-05, 'epoch': 0.01} 44%|████▎ | 218/500 [00:56<01:30, 3.12it/s] 44%|████▍ | 219/500 [00:56<01:29, 3.14it/s] {'loss': 0.0, 'learning_rate': 5.620000000000001e-05, 'epoch': 0.01} 44%|████▍ | 219/500 [00:56<01:29, 3.14it/s] 44%|████▍ | 220/500 [00:57<01:31, 3.08it/s] {'loss': 0.0, 'learning_rate': 5.6000000000000006e-05, 'epoch': 0.01} 44%|████▍ | 220/500 [00:57<01:31, 3.08it/s] 44%|████▍ | 221/500 [00:57<01:37, 2.86it/s] {'loss': 0.0, 'learning_rate': 5.580000000000001e-05, 'epoch': 0.01} 44%|████▍ | 221/500 [00:57<01:37, 2.86it/s] 44%|████▍ | 222/500 [00:57<01:39, 2.79it/s] {'loss': 0.0, 'learning_rate': 5.560000000000001e-05, 'epoch': 0.01} 44%|████▍ | 222/500 [00:58<01:39, 2.79it/s] 45%|████▍ | 223/500 [00:58<01:38, 2.80it/s] {'loss': 0.0, 'learning_rate': 5.5400000000000005e-05, 'epoch': 0.01} 45%|████▍ | 223/500 [00:58<01:38, 2.80it/s] 45%|████▍ | 224/500 [00:58<01:35, 2.89it/s] {'loss': 0.0, 'learning_rate': 5.520000000000001e-05, 'epoch': 0.01} 45%|████▍ | 224/500 [00:58<01:35, 2.89it/s] 45%|████▌ | 225/500 [00:59<01:39, 2.77it/s] {'loss': 0.0, 'learning_rate': 5.500000000000001e-05, 'epoch': 0.01} 45%|████▌ | 225/500 [00:59<01:39, 2.77it/s] 45%|████▌ | 226/500 [00:59<01:40, 2.72it/s] {'loss': 0.0, 'learning_rate': 5.4800000000000004e-05, 'epoch': 0.01} 45%|████▌ | 226/500 [00:59<01:40, 2.72it/s] 45%|████▌ | 227/500 [00:59<01:40, 2.71it/s] {'loss': 0.0, 'learning_rate': 5.4600000000000006e-05, 'epoch': 0.01} 45%|████▌ | 227/500 [00:59<01:40, 2.71it/s] 46%|████▌ | 228/500 [01:00<01:40, 2.70it/s] {'loss': 0.0, 'learning_rate': 5.440000000000001e-05, 'epoch': 0.01} 46%|████▌ | 228/500 [01:00<01:40, 2.70it/s] 46%|████▌ | 229/500 [01:00<01:47, 2.53it/s] {'loss': 0.0, 'learning_rate': 5.420000000000001e-05, 'epoch': 0.01} 46%|████▌ | 229/500 [01:00<01:47, 2.53it/s] 46%|████▌ | 230/500 [01:00<01:40, 2.69it/s] {'loss': 0.0, 'learning_rate': 5.4000000000000005e-05, 'epoch': 0.01} 46%|████▌ | 230/500 [01:00<01:40, 2.69it/s] 46%|████▌ | 231/500 [01:01<01:35, 2.83it/s] {'loss': 0.0, 'learning_rate': 5.380000000000001e-05, 'epoch': 0.01} 46%|████▌ | 231/500 [01:01<01:35, 2.83it/s] 46%|████▋ | 232/500 [01:01<01:27, 3.08it/s] {'loss': 0.0, 'learning_rate': 5.360000000000001e-05, 'epoch': 0.01} 46%|████▋ | 232/500 [01:01<01:27, 3.08it/s] 47%|████▋ | 233/500 [01:01<01:28, 3.02it/s] {'loss': 0.0, 'learning_rate': 5.3400000000000004e-05, 'epoch': 0.01} 47%|████▋ | 233/500 [01:01<01:28, 3.02it/s] 47%|████▋ | 234/500 [01:02<01:30, 2.94it/s] {'loss': 0.0, 'learning_rate': 5.3200000000000006e-05, 'epoch': 0.01} 47%|████▋ | 234/500 [01:02<01:30, 2.94it/s] 47%|████▋ | 235/500 [01:02<01:37, 2.71it/s] {'loss': 0.0, 'learning_rate': 5.300000000000001e-05, 'epoch': 0.01} 47%|████▋ | 235/500 [01:02<01:37, 2.71it/s] 47%|████▋ | 236/500 [01:03<01:45, 2.49it/s] {'loss': 0.0, 'learning_rate': 5.28e-05, 'epoch': 0.01} 47%|████▋ | 236/500 [01:03<01:45, 2.49it/s] 47%|████▋ | 237/500 [01:03<01:49, 2.39it/s] {'loss': 0.0, 'learning_rate': 5.2600000000000005e-05, 'epoch': 0.01} 47%|████▋ | 237/500 [01:03<01:49, 2.39it/s] 48%|████▊ | 238/500 [01:03<01:42, 2.55it/s] {'loss': 0.0, 'learning_rate': 5.2400000000000007e-05, 'epoch': 0.01} 48%|████▊ | 238/500 [01:03<01:42, 2.55it/s] 48%|████▊ | 239/500 [01:04<01:36, 2.72it/s] {'loss': 0.0, 'learning_rate': 5.22e-05, 'epoch': 0.01} 48%|████▊ | 239/500 [01:04<01:36, 2.72it/s] 48%|████▊ | 240/500 [01:04<01:36, 2.71it/s] {'loss': 0.0, 'learning_rate': 5.2000000000000004e-05, 'epoch': 0.01} 48%|████▊ | 240/500 [01:04<01:36, 2.71it/s] 48%|████▊ | 241/500 [01:05<01:35, 2.70it/s] {'loss': 0.0, 'learning_rate': 5.1800000000000005e-05, 'epoch': 0.01} 48%|████▊ | 241/500 [01:05<01:35, 2.70it/s] 48%|████▊ | 242/500 [01:05<01:29, 2.88it/s] {'loss': 0.0, 'learning_rate': 5.16e-05, 'epoch': 0.01} 48%|████▊ | 242/500 [01:05<01:29, 2.88it/s] 49%|████▊ | 243/500 [01:05<01:30, 2.83it/s] {'loss': 0.0, 'learning_rate': 5.14e-05, 'epoch': 0.01} 49%|████▊ | 243/500 [01:05<01:30, 2.83it/s] 49%|████▉ | 244/500 [01:06<01:29, 2.87it/s] {'loss': 0.0, 'learning_rate': 5.1200000000000004e-05, 'epoch': 0.01} 49%|████▉ | 244/500 [01:06<01:29, 2.87it/s] 49%|████▉ | 245/500 [01:06<01:23, 3.04it/s] {'loss': 0.0, 'learning_rate': 5.1000000000000006e-05, 'epoch': 0.01} 49%|████▉ | 245/500 [01:06<01:23, 3.04it/s] 49%|████▉ | 246/500 [01:06<01:23, 3.04it/s] {'loss': 0.0, 'learning_rate': 5.08e-05, 'epoch': 0.01} 49%|████▉ | 246/500 [01:06<01:23, 3.04it/s] 49%|████▉ | 247/500 [01:06<01:21, 3.12it/s] {'loss': 0.0, 'learning_rate': 5.0600000000000003e-05, 'epoch': 0.01} 49%|████▉ | 247/500 [01:06<01:21, 3.12it/s] 50%|████▉ | 248/500 [01:07<01:22, 3.07it/s] {'loss': 0.0, 'learning_rate': 5.0400000000000005e-05, 'epoch': 0.01} 50%|████▉ | 248/500 [01:07<01:22, 3.07it/s] 50%|████▉ | 249/500 [01:07<01:21, 3.06it/s] {'loss': 0.0, 'learning_rate': 5.02e-05, 'epoch': 0.01} 50%|████▉ | 249/500 [01:07<01:21, 3.06it/s] 50%|█████ | 250/500 [01:07<01:19, 3.13it/s] {'loss': 0.0, 'learning_rate': 5e-05, 'epoch': 0.01} 50%|█████ | 250/500 [01:07<01:19, 3.13it/s][INFO|tokenization_utils_base.py:2428] 2023-12-09 15:28:31,637 >> tokenizer config file saved in output/text-20231209-152643-1e-4/checkpoint-250/tokenizer_config.json [INFO|tokenization_utils_base.py:2437] 2023-12-09 15:28:31,638 >> Special tokens file saved in output/text-20231209-152643-1e-4/checkpoint-250/special_tokens_map.json 50%|█████ | 251/500 [01:08<01:27, 2.85it/s] {'loss': 0.0, 'learning_rate': 4.9800000000000004e-05, 'epoch': 0.01} 50%|█████ | 251/500 [01:08<01:27, 2.85it/s] 50%|█████ | 252/500 [01:08<01:25, 2.90it/s] {'loss': 0.0, 'learning_rate': 4.96e-05, 'epoch': 0.01} 50%|█████ | 252/500 [01:08<01:25, 2.90it/s] 51%|█████ | 253/500 [01:08<01:22, 2.99it/s] {'loss': 0.0, 'learning_rate': 4.94e-05, 'epoch': 0.01} 51%|█████ | 253/500 [01:08<01:22, 2.99it/s] 51%|█████ | 254/500 [01:09<01:25, 2.89it/s] {'loss': 0.0, 'learning_rate': 4.92e-05, 'epoch': 0.01} 51%|█████ | 254/500 [01:09<01:25, 2.89it/s] 51%|█████ | 255/500 [01:09<01:24, 2.88it/s] {'loss': 0.0, 'learning_rate': 4.9e-05, 'epoch': 0.01} 51%|█████ | 255/500 [01:09<01:24, 2.88it/s] 51%|█████ | 256/500 [01:10<01:26, 2.84it/s] {'loss': 0.0, 'learning_rate': 4.88e-05, 'epoch': 0.01} 51%|█████ | 256/500 [01:10<01:26, 2.84it/s] 51%|█████▏ | 257/500 [01:10<01:21, 2.97it/s] {'loss': 0.0, 'learning_rate': 4.86e-05, 'epoch': 0.01} 51%|█████▏ | 257/500 [01:10<01:21, 2.97it/s] 52%|█████▏ | 258/500 [01:10<01:22, 2.93it/s] {'loss': 0.0, 'learning_rate': 4.8400000000000004e-05, 'epoch': 0.01} 52%|█████▏ | 258/500 [01:10<01:22, 2.93it/s] 52%|█████▏ | 259/500 [01:10<01:14, 3.24it/s] {'loss': 0.0, 'learning_rate': 4.82e-05, 'epoch': 0.01} 52%|█████▏ | 259/500 [01:10<01:14, 3.24it/s] 52%|█████▏ | 260/500 [01:11<01:08, 3.49it/s] {'loss': 0.0, 'learning_rate': 4.8e-05, 'epoch': 0.01} 52%|█████▏ | 260/500 [01:11<01:08, 3.49it/s] 52%|█████▏ | 261/500 [01:11<01:06, 3.58it/s] {'loss': 0.0, 'learning_rate': 4.78e-05, 'epoch': 0.01} 52%|█████▏ | 261/500 [01:11<01:06, 3.58it/s] 52%|█████▏ | 262/500 [01:11<01:13, 3.22it/s] {'loss': 0.0, 'learning_rate': 4.76e-05, 'epoch': 0.01} 52%|█████▏ | 262/500 [01:11<01:13, 3.22it/s] 53%|█████▎ | 263/500 [01:12<01:09, 3.40it/s] {'loss': 0.0, 'learning_rate': 4.74e-05, 'epoch': 0.01} 53%|█████▎ | 263/500 [01:12<01:09, 3.40it/s] 53%|█████▎ | 264/500 [01:12<01:15, 3.14it/s] {'loss': 0.0, 'learning_rate': 4.72e-05, 'epoch': 0.01} 53%|█████▎ | 264/500 [01:12<01:15, 3.14it/s] 53%|█████▎ | 265/500 [01:12<01:18, 3.00it/s] {'loss': 0.0, 'learning_rate': 4.7e-05, 'epoch': 0.01} 53%|█████▎ | 265/500 [01:12<01:18, 3.00it/s] 53%|█████▎ | 266/500 [01:13<01:19, 2.94it/s] {'loss': 0.0, 'learning_rate': 4.6800000000000006e-05, 'epoch': 0.01} 53%|█████▎ | 266/500 [01:13<01:19, 2.94it/s] 53%|█████▎ | 267/500 [01:13<01:20, 2.91it/s] {'loss': 0.0, 'learning_rate': 4.660000000000001e-05, 'epoch': 0.01} 53%|█████▎ | 267/500 [01:13<01:20, 2.91it/s] 54%|█████▎ | 268/500 [01:13<01:23, 2.77it/s] {'loss': 0.0, 'learning_rate': 4.64e-05, 'epoch': 0.01} 54%|█████▎ | 268/500 [01:13<01:23, 2.77it/s] 54%|█████▍ | 269/500 [01:14<01:19, 2.91it/s] {'loss': 0.0, 'learning_rate': 4.6200000000000005e-05, 'epoch': 0.01} 54%|█████▍ | 269/500 [01:14<01:19, 2.91it/s] 54%|█████▍ | 270/500 [01:14<01:13, 3.15it/s] {'loss': 0.0, 'learning_rate': 4.600000000000001e-05, 'epoch': 0.01} 54%|█████▍ | 270/500 [01:14<01:13, 3.15it/s] 54%|█████▍ | 271/500 [01:14<01:15, 3.04it/s] {'loss': 0.0, 'learning_rate': 4.58e-05, 'epoch': 0.01} 54%|█████▍ | 271/500 [01:14<01:15, 3.04it/s] 54%|█████▍ | 272/500 [01:15<01:13, 3.10it/s] {'loss': 0.0, 'learning_rate': 4.5600000000000004e-05, 'epoch': 0.01} 54%|█████▍ | 272/500 [01:15<01:13, 3.10it/s] 55%|█████▍ | 273/500 [01:15<01:12, 3.13it/s] {'loss': 0.0, 'learning_rate': 4.5400000000000006e-05, 'epoch': 0.01} 55%|█████▍ | 273/500 [01:15<01:12, 3.13it/s] 55%|█████▍ | 274/500 [01:15<01:18, 2.89it/s] {'loss': 0.0, 'learning_rate': 4.52e-05, 'epoch': 0.01} 55%|█████▍ | 274/500 [01:15<01:18, 2.89it/s] 55%|█████▌ | 275/500 [01:16<01:11, 3.13it/s] {'loss': 0.0, 'learning_rate': 4.5e-05, 'epoch': 0.01} 55%|█████▌ | 275/500 [01:16<01:11, 3.13it/s] 55%|█████▌ | 276/500 [01:16<01:15, 2.99it/s] {'loss': 0.0, 'learning_rate': 4.4800000000000005e-05, 'epoch': 0.01} 55%|█████▌ | 276/500 [01:16<01:15, 2.99it/s] 55%|█████▌ | 277/500 [01:16<01:15, 2.97it/s] {'loss': 0.0, 'learning_rate': 4.46e-05, 'epoch': 0.01} 55%|█████▌ | 277/500 [01:16<01:15, 2.97it/s] 56%|█████▌ | 278/500 [01:17<01:08, 3.26it/s] {'loss': 0.0, 'learning_rate': 4.44e-05, 'epoch': 0.01} 56%|█████▌ | 278/500 [01:17<01:08, 3.26it/s] 56%|█████▌ | 279/500 [01:17<01:02, 3.52it/s] {'loss': 0.0, 'learning_rate': 4.4200000000000004e-05, 'epoch': 0.01} 56%|█████▌ | 279/500 [01:17<01:02, 3.52it/s] 56%|█████▌ | 280/500 [01:17<00:59, 3.69it/s] {'loss': 0.0, 'learning_rate': 4.4000000000000006e-05, 'epoch': 0.01} 56%|█████▌ | 280/500 [01:17<00:59, 3.69it/s] 56%|█████▌ | 281/500 [01:17<00:56, 3.85it/s] {'loss': 0.0, 'learning_rate': 4.38e-05, 'epoch': 0.01} 56%|█████▌ | 281/500 [01:17<00:56, 3.85it/s] 56%|█████▋ | 282/500 [01:17<00:55, 3.96it/s] {'loss': 0.0, 'learning_rate': 4.36e-05, 'epoch': 0.01} 56%|█████▋ | 282/500 [01:17<00:55, 3.96it/s] 57%|█████▋ | 283/500 [01:18<00:53, 4.05it/s] {'loss': 0.0, 'learning_rate': 4.3400000000000005e-05, 'epoch': 0.01} 57%|█████▋ | 283/500 [01:18<00:53, 4.05it/s] 57%|█████▋ | 284/500 [01:18<00:52, 4.12it/s] {'loss': 0.0, 'learning_rate': 4.32e-05, 'epoch': 0.01} 57%|█████▋ | 284/500 [01:18<00:52, 4.12it/s] 57%|█████▋ | 285/500 [01:18<00:51, 4.17it/s] {'loss': 0.0, 'learning_rate': 4.3e-05, 'epoch': 0.01} 57%|█████▋ | 285/500 [01:18<00:51, 4.17it/s] 57%|█████▋ | 286/500 [01:18<00:50, 4.21it/s] {'loss': 0.0, 'learning_rate': 4.2800000000000004e-05, 'epoch': 0.01} 57%|█████▋ | 286/500 [01:18<00:50, 4.21it/s] 57%|█████▋ | 287/500 [01:19<00:50, 4.23it/s] {'loss': 0.0, 'learning_rate': 4.26e-05, 'epoch': 0.01} 57%|█████▋ | 287/500 [01:19<00:50, 4.23it/s] 58%|█████▊ | 288/500 [01:19<00:50, 4.23it/s] {'loss': 0.0, 'learning_rate': 4.24e-05, 'epoch': 0.01} 58%|█████▊ | 288/500 [01:19<00:50, 4.23it/s] 58%|█████▊ | 289/500 [01:19<00:49, 4.25it/s] {'loss': 0.0, 'learning_rate': 4.22e-05, 'epoch': 0.01} 58%|█████▊ | 289/500 [01:19<00:49, 4.25it/s] 58%|█████▊ | 290/500 [01:19<00:49, 4.26it/s] {'loss': 0.0, 'learning_rate': 4.2e-05, 'epoch': 0.01} 58%|█████▊ | 290/500 [01:19<00:49, 4.26it/s] 58%|█████▊ | 291/500 [01:20<00:48, 4.29it/s] {'loss': 0.0, 'learning_rate': 4.18e-05, 'epoch': 0.01} 58%|█████▊ | 291/500 [01:20<00:48, 4.29it/s] 58%|█████▊ | 292/500 [01:20<00:48, 4.27it/s] {'loss': 0.0, 'learning_rate': 4.16e-05, 'epoch': 0.01} 58%|█████▊ | 292/500 [01:20<00:48, 4.27it/s] 59%|█████▊ | 293/500 [01:20<00:48, 4.28it/s] {'loss': 0.0, 'learning_rate': 4.14e-05, 'epoch': 0.01} 59%|█████▊ | 293/500 [01:20<00:48, 4.28it/s] 59%|█████▉ | 294/500 [01:20<00:48, 4.27it/s] {'loss': 0.0, 'learning_rate': 4.12e-05, 'epoch': 0.01} 59%|█████▉ | 294/500 [01:20<00:48, 4.27it/s] 59%|█████▉ | 295/500 [01:21<00:47, 4.27it/s] {'loss': 0.0, 'learning_rate': 4.1e-05, 'epoch': 0.01} 59%|█████▉ | 295/500 [01:21<00:47, 4.27it/s] 59%|█████▉ | 296/500 [01:21<00:47, 4.29it/s] {'loss': 0.0, 'learning_rate': 4.08e-05, 'epoch': 0.01} 59%|█████▉ | 296/500 [01:21<00:47, 4.29it/s] 59%|█████▉ | 297/500 [01:21<00:47, 4.29it/s] {'loss': 0.0, 'learning_rate': 4.0600000000000004e-05, 'epoch': 0.01} 59%|█████▉ | 297/500 [01:21<00:47, 4.29it/s] 60%|█████▉ | 298/500 [01:21<00:47, 4.28it/s] {'loss': 0.0, 'learning_rate': 4.0400000000000006e-05, 'epoch': 0.01} 60%|█████▉ | 298/500 [01:21<00:47, 4.28it/s] 60%|█████▉ | 299/500 [01:21<00:47, 4.27it/s] {'loss': 0.0, 'learning_rate': 4.02e-05, 'epoch': 0.01} 60%|█████▉ | 299/500 [01:21<00:47, 4.27it/s] 60%|██████ | 300/500 [01:22<00:46, 4.27it/s] {'loss': 0.0, 'learning_rate': 4e-05, 'epoch': 0.01} 60%|██████ | 300/500 [01:22<00:46, 4.27it/s][INFO|tokenization_utils_base.py:2428] 2023-12-09 15:28:45,964 >> tokenizer config file saved in output/text-20231209-152643-1e-4/checkpoint-300/tokenizer_config.json [INFO|tokenization_utils_base.py:2437] 2023-12-09 15:28:45,964 >> Special tokens file saved in output/text-20231209-152643-1e-4/checkpoint-300/special_tokens_map.json 60%|██████ | 301/500 [01:22<00:50, 3.94it/s] {'loss': 0.0, 'learning_rate': 3.9800000000000005e-05, 'epoch': 0.01} 60%|██████ | 301/500 [01:22<00:50, 3.94it/s] 60%|██████ | 302/500 [01:22<00:48, 4.04it/s] {'loss': 0.0, 'learning_rate': 3.960000000000001e-05, 'epoch': 0.01} 60%|██████ | 302/500 [01:22<00:48, 4.04it/s] 61%|██████ | 303/500 [01:22<00:47, 4.11it/s] {'loss': 0.0, 'learning_rate': 3.94e-05, 'epoch': 0.01} 61%|██████ | 303/500 [01:22<00:47, 4.11it/s] 61%|██████ | 304/500 [01:23<00:47, 4.17it/s] {'loss': 0.0, 'learning_rate': 3.9200000000000004e-05, 'epoch': 0.01} 61%|██████ | 304/500 [01:23<00:47, 4.17it/s] 61%|██████ | 305/500 [01:23<00:46, 4.21it/s] {'loss': 0.0, 'learning_rate': 3.9000000000000006e-05, 'epoch': 0.01} 61%|██████ | 305/500 [01:23<00:46, 4.21it/s] 61%|██████ | 306/500 [01:23<00:45, 4.23it/s] {'loss': 0.0, 'learning_rate': 3.88e-05, 'epoch': 0.01} 61%|██████ | 306/500 [01:23<00:45, 4.23it/s] 61%|██████▏ | 307/500 [01:23<00:45, 4.25it/s] {'loss': 0.0, 'learning_rate': 3.86e-05, 'epoch': 0.01} 61%|██████▏ | 307/500 [01:23<00:45, 4.25it/s] 62%|██████▏ | 308/500 [01:24<00:45, 4.26it/s] {'loss': 0.0, 'learning_rate': 3.8400000000000005e-05, 'epoch': 0.01} 62%|██████▏ | 308/500 [01:24<00:45, 4.26it/s] 62%|██████▏ | 309/500 [01:24<00:45, 4.24it/s] {'loss': 0.0, 'learning_rate': 3.82e-05, 'epoch': 0.01} 62%|██████▏ | 309/500 [01:24<00:45, 4.24it/s] 62%|██████▏ | 310/500 [01:24<00:44, 4.25it/s] {'loss': 0.0, 'learning_rate': 3.8e-05, 'epoch': 0.01} 62%|██████▏ | 310/500 [01:24<00:44, 4.25it/s] 62%|██████▏ | 311/500 [01:24<00:44, 4.25it/s] {'loss': 0.0, 'learning_rate': 3.7800000000000004e-05, 'epoch': 0.01} 62%|██████▏ | 311/500 [01:24<00:44, 4.25it/s] 62%|██████▏ | 312/500 [01:25<00:44, 4.24it/s] {'loss': 0.0, 'learning_rate': 3.76e-05, 'epoch': 0.01} 62%|██████▏ | 312/500 [01:25<00:44, 4.24it/s] 63%|██████▎ | 313/500 [01:25<00:44, 4.24it/s] {'loss': 0.0, 'learning_rate': 3.74e-05, 'epoch': 0.01} 63%|██████▎ | 313/500 [01:25<00:44, 4.24it/s] 63%|██████▎ | 314/500 [01:25<00:43, 4.26it/s] {'loss': 0.0, 'learning_rate': 3.72e-05, 'epoch': 0.01} 63%|██████▎ | 314/500 [01:25<00:43, 4.26it/s] 63%|██████▎ | 315/500 [01:25<00:43, 4.25it/s] {'loss': 0.0, 'learning_rate': 3.7e-05, 'epoch': 0.01} 63%|██████▎ | 315/500 [01:25<00:43, 4.25it/s] 63%|██████▎ | 316/500 [01:26<00:43, 4.24it/s] {'loss': 0.0, 'learning_rate': 3.68e-05, 'epoch': 0.01} 63%|██████▎ | 316/500 [01:26<00:43, 4.24it/s] 63%|██████▎ | 317/500 [01:26<00:43, 4.25it/s] {'loss': 0.0, 'learning_rate': 3.66e-05, 'epoch': 0.01} 63%|██████▎ | 317/500 [01:26<00:43, 4.25it/s] 64%|██████▎ | 318/500 [01:26<00:43, 4.23it/s] {'loss': 0.0, 'learning_rate': 3.6400000000000004e-05, 'epoch': 0.01} 64%|██████▎ | 318/500 [01:26<00:43, 4.23it/s] 64%|██████▍ | 319/500 [01:26<00:42, 4.22it/s] {'loss': 0.0, 'learning_rate': 3.62e-05, 'epoch': 0.01} 64%|██████▍ | 319/500 [01:26<00:42, 4.22it/s] 64%|██████▍ | 320/500 [01:26<00:42, 4.24it/s] {'loss': 0.0, 'learning_rate': 3.6e-05, 'epoch': 0.01} 64%|██████▍ | 320/500 [01:26<00:42, 4.24it/s] 64%|██████▍ | 321/500 [01:27<00:42, 4.26it/s] {'loss': 0.0, 'learning_rate': 3.58e-05, 'epoch': 0.01} 64%|██████▍ | 321/500 [01:27<00:42, 4.26it/s] 64%|██████▍ | 322/500 [01:27<00:41, 4.26it/s] {'loss': 0.0, 'learning_rate': 3.56e-05, 'epoch': 0.01} 64%|██████▍ | 322/500 [01:27<00:41, 4.26it/s] 65%|██████▍ | 323/500 [01:27<00:41, 4.26it/s] {'loss': 0.0, 'learning_rate': 3.54e-05, 'epoch': 0.01} 65%|██████▍ | 323/500 [01:27<00:41, 4.26it/s] 65%|██████▍ | 324/500 [01:27<00:41, 4.26it/s] {'loss': 0.0, 'learning_rate': 3.52e-05, 'epoch': 0.01} 65%|██████▍ | 324/500 [01:27<00:41, 4.26it/s] 65%|██████▌ | 325/500 [01:28<00:41, 4.26it/s] {'loss': 0.0, 'learning_rate': 3.5e-05, 'epoch': 0.01} 65%|██████▌ | 325/500 [01:28<00:41, 4.26it/s] 65%|██████▌ | 326/500 [01:28<00:40, 4.28it/s] {'loss': 0.0, 'learning_rate': 3.48e-05, 'epoch': 0.01} 65%|██████▌ | 326/500 [01:28<00:40, 4.28it/s] 65%|██████▌ | 327/500 [01:28<00:40, 4.29it/s] {'loss': 0.0, 'learning_rate': 3.46e-05, 'epoch': 0.01} 65%|██████▌ | 327/500 [01:28<00:40, 4.29it/s] 66%|██████▌ | 328/500 [01:28<00:40, 4.29it/s] {'loss': 0.0, 'learning_rate': 3.4399999999999996e-05, 'epoch': 0.01} 66%|██████▌ | 328/500 [01:28<00:40, 4.29it/s] 66%|██████▌ | 329/500 [01:29<00:39, 4.29it/s] {'loss': 0.0, 'learning_rate': 3.4200000000000005e-05, 'epoch': 0.01} 66%|██████▌ | 329/500 [01:29<00:39, 4.29it/s] 66%|██████▌ | 330/500 [01:29<00:39, 4.29it/s] {'loss': 0.0, 'learning_rate': 3.4000000000000007e-05, 'epoch': 0.01} 66%|██████▌ | 330/500 [01:29<00:39, 4.29it/s] 66%|██████▌ | 331/500 [01:29<00:39, 4.28it/s] {'loss': 0.0, 'learning_rate': 3.38e-05, 'epoch': 0.01} 66%|██████▌ | 331/500 [01:29<00:39, 4.28it/s] 66%|██████▋ | 332/500 [01:29<00:39, 4.29it/s] {'loss': 0.0, 'learning_rate': 3.3600000000000004e-05, 'epoch': 0.01} 66%|██████▋ | 332/500 [01:29<00:39, 4.29it/s] 67%|██████▋ | 333/500 [01:30<00:39, 4.27it/s] {'loss': 0.0, 'learning_rate': 3.3400000000000005e-05, 'epoch': 0.01} 67%|██████▋ | 333/500 [01:30<00:39, 4.27it/s] 67%|██████▋ | 334/500 [01:30<00:39, 4.25it/s] {'loss': 0.0, 'learning_rate': 3.32e-05, 'epoch': 0.01} 67%|██████▋ | 334/500 [01:30<00:39, 4.25it/s] 67%|██████▋ | 335/500 [01:30<00:38, 4.24it/s] {'loss': 0.0, 'learning_rate': 3.3e-05, 'epoch': 0.01} 67%|██████▋ | 335/500 [01:30<00:38, 4.24it/s] 67%|██████▋ | 336/500 [01:30<00:38, 4.23it/s] {'loss': 0.0, 'learning_rate': 3.2800000000000004e-05, 'epoch': 0.01} 67%|██████▋ | 336/500 [01:30<00:38, 4.23it/s] 67%|██████▋ | 337/500 [01:30<00:38, 4.25it/s] {'loss': 0.0, 'learning_rate': 3.26e-05, 'epoch': 0.01} 67%|██████▋ | 337/500 [01:30<00:38, 4.25it/s] 68%|██████▊ | 338/500 [01:31<00:38, 4.23it/s] {'loss': 0.0, 'learning_rate': 3.24e-05, 'epoch': 0.01} 68%|██████▊ | 338/500 [01:31<00:38, 4.23it/s] 68%|██████▊ | 339/500 [01:31<00:38, 4.21it/s] {'loss': 0.0, 'learning_rate': 3.2200000000000003e-05, 'epoch': 0.01} 68%|██████▊ | 339/500 [01:31<00:38, 4.21it/s] 68%|██████▊ | 340/500 [01:31<00:37, 4.23it/s] {'loss': 0.0, 'learning_rate': 3.2000000000000005e-05, 'epoch': 0.01} 68%|██████▊ | 340/500 [01:31<00:37, 4.23it/s] 68%|██████▊ | 341/500 [01:31<00:37, 4.21it/s] {'loss': 0.0, 'learning_rate': 3.18e-05, 'epoch': 0.01} 68%|██████▊ | 341/500 [01:31<00:37, 4.21it/s] 68%|██████▊ | 342/500 [01:32<00:37, 4.21it/s] {'loss': 0.0, 'learning_rate': 3.16e-05, 'epoch': 0.01} 68%|██████▊ | 342/500 [01:32<00:37, 4.21it/s] 69%|██████▊ | 343/500 [01:32<00:37, 4.22it/s] {'loss': 0.0, 'learning_rate': 3.1400000000000004e-05, 'epoch': 0.01} 69%|██████▊ | 343/500 [01:32<00:37, 4.22it/s] 69%|██████▉ | 344/500 [01:32<00:36, 4.23it/s] {'loss': 0.0, 'learning_rate': 3.12e-05, 'epoch': 0.01} 69%|██████▉ | 344/500 [01:32<00:36, 4.23it/s] 69%|██████▉ | 345/500 [01:32<00:36, 4.26it/s] {'loss': 0.0, 'learning_rate': 3.1e-05, 'epoch': 0.01} 69%|██████▉ | 345/500 [01:32<00:36, 4.26it/s] 69%|██████▉ | 346/500 [01:33<00:36, 4.28it/s] {'loss': 0.0, 'learning_rate': 3.08e-05, 'epoch': 0.01} 69%|██████▉ | 346/500 [01:33<00:36, 4.28it/s] 69%|██████▉ | 347/500 [01:33<00:35, 4.28it/s] {'loss': 0.0, 'learning_rate': 3.06e-05, 'epoch': 0.01} 69%|██████▉ | 347/500 [01:33<00:35, 4.28it/s] 70%|██████▉ | 348/500 [01:33<00:35, 4.27it/s] {'loss': 0.0, 'learning_rate': 3.04e-05, 'epoch': 0.01} 70%|██████▉ | 348/500 [01:33<00:35, 4.27it/s] 70%|██████▉ | 349/500 [01:33<00:35, 4.27it/s] {'loss': 0.0, 'learning_rate': 3.02e-05, 'epoch': 0.01} 70%|██████▉ | 349/500 [01:33<00:35, 4.27it/s] 70%|███████ | 350/500 [01:34<00:35, 4.28it/s] {'loss': 0.0, 'learning_rate': 3e-05, 'epoch': 0.01} 70%|███████ | 350/500 [01:34<00:35, 4.28it/s][INFO|tokenization_utils_base.py:2428] 2023-12-09 15:28:57,758 >> tokenizer config file saved in output/text-20231209-152643-1e-4/checkpoint-350/tokenizer_config.json [INFO|tokenization_utils_base.py:2437] 2023-12-09 15:28:57,758 >> Special tokens file saved in output/text-20231209-152643-1e-4/checkpoint-350/special_tokens_map.json 70%|███████ | 351/500 [01:34<00:37, 3.96it/s] {'loss': 0.0, 'learning_rate': 2.98e-05, 'epoch': 0.01} 70%|███████ | 351/500 [01:34<00:37, 3.96it/s] 70%|███████ | 352/500 [01:34<00:36, 4.05it/s] {'loss': 0.0, 'learning_rate': 2.96e-05, 'epoch': 0.01} 70%|███████ | 352/500 [01:34<00:36, 4.05it/s] 71%|███████ | 353/500 [01:34<00:35, 4.12it/s] {'loss': 0.0, 'learning_rate': 2.94e-05, 'epoch': 0.01} 71%|███████ | 353/500 [01:34<00:35, 4.12it/s] 71%|███████ | 354/500 [01:35<00:35, 4.13it/s] {'loss': 0.0, 'learning_rate': 2.9199999999999998e-05, 'epoch': 0.01} 71%|███████ | 354/500 [01:35<00:35, 4.13it/s] 71%|███████ | 355/500 [01:35<00:34, 4.17it/s] {'loss': 0.0, 'learning_rate': 2.9e-05, 'epoch': 0.01} 71%|███████ | 355/500 [01:35<00:34, 4.17it/s] 71%|███████ | 356/500 [01:35<00:34, 4.21it/s] {'loss': 0.0, 'learning_rate': 2.88e-05, 'epoch': 0.01} 71%|███████ | 356/500 [01:35<00:34, 4.21it/s] 71%|███████▏ | 357/500 [01:35<00:33, 4.21it/s] {'loss': 0.0, 'learning_rate': 2.86e-05, 'epoch': 0.01} 71%|███████▏ | 357/500 [01:35<00:33, 4.21it/s] 72%|███████▏ | 358/500 [01:35<00:33, 4.22it/s] {'loss': 0.0, 'learning_rate': 2.84e-05, 'epoch': 0.01} 72%|███████▏ | 358/500 [01:35<00:33, 4.22it/s] 72%|███████▏ | 359/500 [01:36<00:33, 4.24it/s] {'loss': 0.0, 'learning_rate': 2.8199999999999998e-05, 'epoch': 0.01} 72%|███████▏ | 359/500 [01:36<00:33, 4.24it/s] 72%|███████▏ | 360/500 [01:36<00:33, 4.23it/s] {'loss': 0.0, 'learning_rate': 2.8000000000000003e-05, 'epoch': 0.01} 72%|███████▏ | 360/500 [01:36<00:33, 4.23it/s] 72%|███████▏ | 361/500 [01:36<00:32, 4.24it/s] {'loss': 0.0, 'learning_rate': 2.7800000000000005e-05, 'epoch': 0.01} 72%|███████▏ | 361/500 [01:36<00:32, 4.24it/s] 72%|███████▏ | 362/500 [01:36<00:32, 4.25it/s] {'loss': 0.0, 'learning_rate': 2.7600000000000003e-05, 'epoch': 0.01} 72%|███████▏ | 362/500 [01:36<00:32, 4.25it/s] 73%|███████▎ | 363/500 [01:37<00:32, 4.24it/s] {'loss': 0.0, 'learning_rate': 2.7400000000000002e-05, 'epoch': 0.01} 73%|███████▎ | 363/500 [01:37<00:32, 4.24it/s] 73%|███████▎ | 364/500 [01:37<00:32, 4.24it/s] {'loss': 0.0, 'learning_rate': 2.7200000000000004e-05, 'epoch': 0.01} 73%|███████▎ | 364/500 [01:37<00:32, 4.24it/s] 73%|███████▎ | 365/500 [01:37<00:31, 4.27it/s] {'loss': 0.0, 'learning_rate': 2.7000000000000002e-05, 'epoch': 0.01} 73%|███████▎ | 365/500 [01:37<00:31, 4.27it/s] 73%|███████▎ | 366/500 [01:37<00:31, 4.28it/s] {'loss': 0.0, 'learning_rate': 2.6800000000000004e-05, 'epoch': 0.01} 73%|███████▎ | 366/500 [01:37<00:31, 4.28it/s] 73%|███████▎ | 367/500 [01:38<00:31, 4.27it/s] {'loss': 0.0, 'learning_rate': 2.6600000000000003e-05, 'epoch': 0.01} 73%|███████▎ | 367/500 [01:38<00:31, 4.27it/s] 74%|███████▎ | 368/500 [01:38<00:30, 4.28it/s] {'loss': 0.0, 'learning_rate': 2.64e-05, 'epoch': 0.01} 74%|███████▎ | 368/500 [01:38<00:30, 4.28it/s] 74%|███████▍ | 369/500 [01:38<00:30, 4.27it/s] {'loss': 0.0, 'learning_rate': 2.6200000000000003e-05, 'epoch': 0.01} 74%|███████▍ | 369/500 [01:38<00:30, 4.27it/s] 74%|███████▍ | 370/500 [01:38<00:30, 4.26it/s] {'loss': 0.0, 'learning_rate': 2.6000000000000002e-05, 'epoch': 0.01} 74%|███████▍ | 370/500 [01:38<00:30, 4.26it/s] 74%|███████▍ | 371/500 [01:38<00:30, 4.26it/s] {'loss': 0.0, 'learning_rate': 2.58e-05, 'epoch': 0.01} 74%|███████▍ | 371/500 [01:39<00:30, 4.26it/s] 74%|███████▍ | 372/500 [01:39<00:29, 4.27it/s] {'loss': 0.0, 'learning_rate': 2.5600000000000002e-05, 'epoch': 0.01} 74%|███████▍ | 372/500 [01:39<00:29, 4.27it/s] 75%|███████▍ | 373/500 [01:39<00:29, 4.26it/s] {'loss': 0.0, 'learning_rate': 2.54e-05, 'epoch': 0.01} 75%|███████▍ | 373/500 [01:39<00:29, 4.26it/s] 75%|███████▍ | 374/500 [01:39<00:29, 4.26it/s] {'loss': 0.0, 'learning_rate': 2.5200000000000003e-05, 'epoch': 0.01} 75%|███████▍ | 374/500 [01:39<00:29, 4.26it/s] 75%|███████▌ | 375/500 [01:39<00:29, 4.27it/s] {'loss': 0.0, 'learning_rate': 2.5e-05, 'epoch': 0.01} 75%|███████▌ | 375/500 [01:39<00:29, 4.27it/s] 75%|███████▌ | 376/500 [01:40<00:28, 4.28it/s] {'loss': 0.0, 'learning_rate': 2.48e-05, 'epoch': 0.01} 75%|███████▌ | 376/500 [01:40<00:28, 4.28it/s] 75%|███████▌ | 377/500 [01:40<00:28, 4.26it/s] {'loss': 0.0, 'learning_rate': 2.46e-05, 'epoch': 0.01} 75%|███████▌ | 377/500 [01:40<00:28, 4.26it/s] 76%|███████▌ | 378/500 [01:40<00:28, 4.26it/s] {'loss': 0.0, 'learning_rate': 2.44e-05, 'epoch': 0.01} 76%|███████▌ | 378/500 [01:40<00:28, 4.26it/s] 76%|███████▌ | 379/500 [01:40<00:28, 4.23it/s] {'loss': 0.0, 'learning_rate': 2.4200000000000002e-05, 'epoch': 0.01} 76%|███████▌ | 379/500 [01:40<00:28, 4.23it/s] 76%|███████▌ | 380/500 [01:41<00:28, 4.22it/s] {'loss': 0.0, 'learning_rate': 2.4e-05, 'epoch': 0.01} 76%|███████▌ | 380/500 [01:41<00:28, 4.22it/s] 76%|███████▌ | 381/500 [01:41<00:28, 4.22it/s] {'loss': 0.0, 'learning_rate': 2.38e-05, 'epoch': 0.01} 76%|███████▌ | 381/500 [01:41<00:28, 4.22it/s] 76%|███████▋ | 382/500 [01:41<00:27, 4.22it/s] {'loss': 0.0, 'learning_rate': 2.36e-05, 'epoch': 0.01} 76%|███████▋ | 382/500 [01:41<00:27, 4.22it/s] 77%|███████▋ | 383/500 [01:41<00:27, 4.25it/s] {'loss': 0.0, 'learning_rate': 2.3400000000000003e-05, 'epoch': 0.01} 77%|███████▋ | 383/500 [01:41<00:27, 4.25it/s] 77%|███████▋ | 384/500 [01:42<00:27, 4.25it/s] {'loss': 0.0, 'learning_rate': 2.32e-05, 'epoch': 0.01} 77%|███████▋ | 384/500 [01:42<00:27, 4.25it/s] 77%|███████▋ | 385/500 [01:42<00:26, 4.27it/s] {'loss': 0.0, 'learning_rate': 2.3000000000000003e-05, 'epoch': 0.01} 77%|███████▋ | 385/500 [01:42<00:26, 4.27it/s] 77%|███████▋ | 386/500 [01:42<00:26, 4.28it/s] {'loss': 0.0, 'learning_rate': 2.2800000000000002e-05, 'epoch': 0.01} 77%|███████▋ | 386/500 [01:42<00:26, 4.28it/s] 77%|███████▋ | 387/500 [01:42<00:26, 4.30it/s] {'loss': 0.0, 'learning_rate': 2.26e-05, 'epoch': 0.01} 77%|███████▋ | 387/500 [01:42<00:26, 4.30it/s] 78%|███████▊ | 388/500 [01:42<00:26, 4.27it/s] {'loss': 0.0, 'learning_rate': 2.2400000000000002e-05, 'epoch': 0.01} 78%|███████▊ | 388/500 [01:42<00:26, 4.27it/s] 78%|███████▊ | 389/500 [01:43<00:25, 4.29it/s] {'loss': 0.0, 'learning_rate': 2.22e-05, 'epoch': 0.01} 78%|███████▊ | 389/500 [01:43<00:25, 4.29it/s] 78%|███████▊ | 390/500 [01:43<00:25, 4.28it/s] {'loss': 0.0, 'learning_rate': 2.2000000000000003e-05, 'epoch': 0.01} 78%|███████▊ | 390/500 [01:43<00:25, 4.28it/s] 78%|███████▊ | 391/500 [01:43<00:25, 4.29it/s] {'loss': 0.0, 'learning_rate': 2.18e-05, 'epoch': 0.02} 78%|███████▊ | 391/500 [01:43<00:25, 4.29it/s] 78%|███████▊ | 392/500 [01:43<00:25, 4.29it/s] {'loss': 0.0, 'learning_rate': 2.16e-05, 'epoch': 0.02} 78%|███████▊ | 392/500 [01:43<00:25, 4.29it/s] 79%|███████▊ | 393/500 [01:44<00:24, 4.29it/s] {'loss': 0.0, 'learning_rate': 2.1400000000000002e-05, 'epoch': 0.02} 79%|███████▊ | 393/500 [01:44<00:24, 4.29it/s] 79%|███████▉ | 394/500 [01:44<00:24, 4.28it/s] {'loss': 0.0, 'learning_rate': 2.12e-05, 'epoch': 0.02} 79%|███████▉ | 394/500 [01:44<00:24, 4.28it/s] 79%|███████▉ | 395/500 [01:44<00:24, 4.28it/s] {'loss': 0.0, 'learning_rate': 2.1e-05, 'epoch': 0.02} 79%|███████▉ | 395/500 [01:44<00:24, 4.28it/s] 79%|███████▉ | 396/500 [01:44<00:24, 4.29it/s] {'loss': 0.0, 'learning_rate': 2.08e-05, 'epoch': 0.02} 79%|███████▉ | 396/500 [01:44<00:24, 4.29it/s] 79%|███████▉ | 397/500 [01:45<00:23, 4.30it/s] {'loss': 0.0, 'learning_rate': 2.06e-05, 'epoch': 0.02} 79%|███████▉ | 397/500 [01:45<00:23, 4.30it/s] 80%|███████▉ | 398/500 [01:45<00:23, 4.26it/s] {'loss': 0.0, 'learning_rate': 2.04e-05, 'epoch': 0.02} 80%|███████▉ | 398/500 [01:45<00:23, 4.26it/s] 80%|███████▉ | 399/500 [01:45<00:23, 4.27it/s] {'loss': 0.0, 'learning_rate': 2.0200000000000003e-05, 'epoch': 0.02} 80%|███████▉ | 399/500 [01:45<00:23, 4.27it/s] 80%|████████ | 400/500 [01:45<00:23, 4.28it/s] {'loss': 0.0, 'learning_rate': 2e-05, 'epoch': 0.02} 80%|████████ | 400/500 [01:45<00:23, 4.28it/s][INFO|tokenization_utils_base.py:2428] 2023-12-09 15:29:09,545 >> tokenizer config file saved in output/text-20231209-152643-1e-4/checkpoint-400/tokenizer_config.json [INFO|tokenization_utils_base.py:2437] 2023-12-09 15:29:09,545 >> Special tokens file saved in output/text-20231209-152643-1e-4/checkpoint-400/special_tokens_map.json 80%|████████ | 401/500 [01:46<00:25, 3.95it/s] {'loss': 0.0, 'learning_rate': 1.9800000000000004e-05, 'epoch': 0.02} 80%|████████ | 401/500 [01:46<00:25, 3.95it/s] 80%|████████ | 402/500 [01:46<00:24, 4.04it/s] {'loss': 0.0, 'learning_rate': 1.9600000000000002e-05, 'epoch': 0.02} 80%|████████ | 402/500 [01:46<00:24, 4.04it/s] 81%|████████ | 403/500 [01:46<00:23, 4.11it/s] {'loss': 0.0, 'learning_rate': 1.94e-05, 'epoch': 0.02} 81%|████████ | 403/500 [01:46<00:23, 4.11it/s] 81%|████████ | 404/500 [01:46<00:23, 4.17it/s] {'loss': 0.0, 'learning_rate': 1.9200000000000003e-05, 'epoch': 0.02} 81%|████████ | 404/500 [01:46<00:23, 4.17it/s] 81%|████████ | 405/500 [01:47<00:22, 4.20it/s] {'loss': 0.0, 'learning_rate': 1.9e-05, 'epoch': 0.02} 81%|████████ | 405/500 [01:47<00:22, 4.20it/s] 81%|████████ | 406/500 [01:47<00:22, 4.23it/s] {'loss': 0.0, 'learning_rate': 1.88e-05, 'epoch': 0.02} 81%|████████ | 406/500 [01:47<00:22, 4.23it/s] 81%|████████▏ | 407/500 [01:47<00:21, 4.25it/s] {'loss': 0.0, 'learning_rate': 1.86e-05, 'epoch': 0.02} 81%|████████▏ | 407/500 [01:47<00:21, 4.25it/s] 82%|████████▏ | 408/500 [01:47<00:21, 4.25it/s] {'loss': 0.0, 'learning_rate': 1.84e-05, 'epoch': 0.02} 82%|████████▏ | 408/500 [01:47<00:21, 4.25it/s] 82%|████████▏ | 409/500 [01:47<00:21, 4.25it/s] {'loss': 0.0, 'learning_rate': 1.8200000000000002e-05, 'epoch': 0.02} 82%|████████▏ | 409/500 [01:47<00:21, 4.25it/s] 82%|████████▏ | 410/500 [01:48<00:21, 4.24it/s] {'loss': 0.0, 'learning_rate': 1.8e-05, 'epoch': 0.02} 82%|████████▏ | 410/500 [01:48<00:21, 4.24it/s] 82%|████████▏ | 411/500 [01:48<00:20, 4.25it/s] {'loss': 0.0, 'learning_rate': 1.78e-05, 'epoch': 0.02} 82%|████████▏ | 411/500 [01:48<00:20, 4.25it/s] 82%|████████▏ | 412/500 [01:48<00:21, 4.04it/s] {'loss': 0.0, 'learning_rate': 1.76e-05, 'epoch': 0.02} 82%|████████▏ | 412/500 [01:48<00:21, 4.04it/s] 83%|████████▎ | 413/500 [01:49<00:23, 3.67it/s] {'loss': 0.0, 'learning_rate': 1.74e-05, 'epoch': 0.02} 83%|████████▎ | 413/500 [01:49<00:23, 3.67it/s] 83%|████████▎ | 414/500 [01:49<00:24, 3.58it/s] {'loss': 0.0, 'learning_rate': 1.7199999999999998e-05, 'epoch': 0.02} 83%|████████▎ | 414/500 [01:49<00:24, 3.58it/s] 83%|████████▎ | 415/500 [01:49<00:25, 3.35it/s] {'loss': 0.0, 'learning_rate': 1.7000000000000003e-05, 'epoch': 0.02} 83%|████████▎ | 415/500 [01:49<00:25, 3.35it/s] 83%|████████▎ | 416/500 [01:49<00:23, 3.59it/s] {'loss': 0.0, 'learning_rate': 1.6800000000000002e-05, 'epoch': 0.02} 83%|████████▎ | 416/500 [01:49<00:23, 3.59it/s] 83%|████████▎ | 417/500 [01:50<00:22, 3.77it/s] {'loss': 0.0, 'learning_rate': 1.66e-05, 'epoch': 0.02} 83%|████████▎ | 417/500 [01:50<00:22, 3.77it/s] 84%|████████▎ | 418/500 [01:50<00:21, 3.90it/s] {'loss': 0.0, 'learning_rate': 1.6400000000000002e-05, 'epoch': 0.02} 84%|████████▎ | 418/500 [01:50<00:21, 3.90it/s] 84%|████████▍ | 419/500 [01:50<00:20, 4.00it/s] {'loss': 0.0, 'learning_rate': 1.62e-05, 'epoch': 0.02} 84%|████████▍ | 419/500 [01:50<00:20, 4.00it/s] 84%|████████▍ | 420/500 [01:50<00:19, 4.08it/s] {'loss': 0.0, 'learning_rate': 1.6000000000000003e-05, 'epoch': 0.02} 84%|████████▍ | 420/500 [01:50<00:19, 4.08it/s] 84%|████████▍ | 421/500 [01:51<00:19, 4.15it/s] {'loss': 0.0, 'learning_rate': 1.58e-05, 'epoch': 0.02} 84%|████████▍ | 421/500 [01:51<00:19, 4.15it/s] 84%|████████▍ | 422/500 [01:51<00:18, 4.17it/s] {'loss': 0.0, 'learning_rate': 1.56e-05, 'epoch': 0.02} 84%|████████▍ | 422/500 [01:51<00:18, 4.17it/s] 85%|████████▍ | 423/500 [01:51<00:18, 4.21it/s] {'loss': 0.0, 'learning_rate': 1.54e-05, 'epoch': 0.02} 85%|████████▍ | 423/500 [01:51<00:18, 4.21it/s] 85%|████████▍ | 424/500 [01:51<00:17, 4.22it/s] {'loss': 0.0, 'learning_rate': 1.52e-05, 'epoch': 0.02} 85%|████████▍ | 424/500 [01:51<00:17, 4.22it/s] 85%|████████▌ | 425/500 [01:52<00:17, 4.21it/s] {'loss': 0.0, 'learning_rate': 1.5e-05, 'epoch': 0.02} 85%|████████▌ | 425/500 [01:52<00:17, 4.21it/s] 85%|████████▌ | 426/500 [01:52<00:17, 4.22it/s] {'loss': 0.0, 'learning_rate': 1.48e-05, 'epoch': 0.02} 85%|████████▌ | 426/500 [01:52<00:17, 4.22it/s] 85%|████████▌ | 427/500 [01:52<00:17, 4.23it/s] {'loss': 0.0, 'learning_rate': 1.4599999999999999e-05, 'epoch': 0.02} 85%|████████▌ | 427/500 [01:52<00:17, 4.23it/s] 86%|████████▌ | 428/500 [01:52<00:16, 4.25it/s] {'loss': 0.0, 'learning_rate': 1.44e-05, 'epoch': 0.02} 86%|████████▌ | 428/500 [01:52<00:16, 4.25it/s] 86%|████████▌ | 429/500 [01:52<00:16, 4.25it/s] {'loss': 0.0, 'learning_rate': 1.42e-05, 'epoch': 0.02} 86%|████████▌ | 429/500 [01:52<00:16, 4.25it/s] 86%|████████▌ | 430/500 [01:53<00:16, 4.24it/s] {'loss': 0.0, 'learning_rate': 1.4000000000000001e-05, 'epoch': 0.02} 86%|████████▌ | 430/500 [01:53<00:16, 4.24it/s] 86%|████████▌ | 431/500 [01:53<00:16, 4.27it/s] {'loss': 0.0, 'learning_rate': 1.3800000000000002e-05, 'epoch': 0.02} 86%|████████▌ | 431/500 [01:53<00:16, 4.27it/s] 86%|████████▋ | 432/500 [01:53<00:16, 4.25it/s] {'loss': 0.0, 'learning_rate': 1.3600000000000002e-05, 'epoch': 0.02} 86%|████████▋ | 432/500 [01:53<00:16, 4.25it/s] 87%|████████▋ | 433/500 [01:53<00:15, 4.25it/s] {'loss': 0.0, 'learning_rate': 1.3400000000000002e-05, 'epoch': 0.02} 87%|████████▋ | 433/500 [01:53<00:15, 4.25it/s] 87%|████████▋ | 434/500 [01:54<00:15, 4.25it/s] {'loss': 0.0, 'learning_rate': 1.32e-05, 'epoch': 0.02} 87%|████████▋ | 434/500 [01:54<00:15, 4.25it/s] 87%|████████▋ | 435/500 [01:54<00:15, 4.23it/s] {'loss': 0.0, 'learning_rate': 1.3000000000000001e-05, 'epoch': 0.02} 87%|████████▋ | 435/500 [01:54<00:15, 4.23it/s] 87%|████████▋ | 436/500 [01:54<00:15, 4.24it/s] {'loss': 0.0, 'learning_rate': 1.2800000000000001e-05, 'epoch': 0.02} 87%|████████▋ | 436/500 [01:54<00:15, 4.24it/s] 87%|████████▋ | 437/500 [01:54<00:14, 4.24it/s] {'loss': 0.0, 'learning_rate': 1.2600000000000001e-05, 'epoch': 0.02} 87%|████████▋ | 437/500 [01:54<00:14, 4.24it/s] 88%|████████▊ | 438/500 [01:55<00:14, 4.25it/s] {'loss': 0.0, 'learning_rate': 1.24e-05, 'epoch': 0.02} 88%|████████▊ | 438/500 [01:55<00:14, 4.25it/s] 88%|████████▊ | 439/500 [01:55<00:14, 4.25it/s] {'loss': 0.0, 'learning_rate': 1.22e-05, 'epoch': 0.02} 88%|████████▊ | 439/500 [01:55<00:14, 4.25it/s] 88%|████████▊ | 440/500 [01:55<00:14, 4.27it/s] {'loss': 0.0, 'learning_rate': 1.2e-05, 'epoch': 0.02} 88%|████████▊ | 440/500 [01:55<00:14, 4.27it/s] 88%|████████▊ | 441/500 [01:55<00:13, 4.28it/s] {'loss': 0.0, 'learning_rate': 1.18e-05, 'epoch': 0.02} 88%|████████▊ | 441/500 [01:55<00:13, 4.28it/s] 88%|████████▊ | 442/500 [01:56<00:13, 4.27it/s] {'loss': 0.0, 'learning_rate': 1.16e-05, 'epoch': 0.02} 88%|████████▊ | 442/500 [01:56<00:13, 4.27it/s] 89%|████████▊ | 443/500 [01:56<00:13, 4.26it/s] {'loss': 0.0, 'learning_rate': 1.1400000000000001e-05, 'epoch': 0.02} 89%|████████▊ | 443/500 [01:56<00:13, 4.26it/s] 89%|████████▉ | 444/500 [01:56<00:13, 4.26it/s] {'loss': 0.0, 'learning_rate': 1.1200000000000001e-05, 'epoch': 0.02} 89%|████████▉ | 444/500 [01:56<00:13, 4.26it/s] 89%|████████▉ | 445/500 [01:56<00:12, 4.26it/s] {'loss': 0.0, 'learning_rate': 1.1000000000000001e-05, 'epoch': 0.02} 89%|████████▉ | 445/500 [01:56<00:12, 4.26it/s] 89%|████████▉ | 446/500 [01:56<00:12, 4.27it/s] {'loss': 0.0, 'learning_rate': 1.08e-05, 'epoch': 0.02} 89%|████████▉ | 446/500 [01:56<00:12, 4.27it/s] 89%|████████▉ | 447/500 [01:57<00:12, 4.25it/s] {'loss': 0.0, 'learning_rate': 1.06e-05, 'epoch': 0.02} 89%|████████▉ | 447/500 [01:57<00:12, 4.25it/s] 90%|████████▉ | 448/500 [01:57<00:12, 4.26it/s] {'loss': 0.0, 'learning_rate': 1.04e-05, 'epoch': 0.02} 90%|████████▉ | 448/500 [01:57<00:12, 4.26it/s] 90%|████████▉ | 449/500 [01:57<00:11, 4.28it/s] {'loss': 0.0, 'learning_rate': 1.02e-05, 'epoch': 0.02} 90%|████████▉ | 449/500 [01:57<00:11, 4.28it/s] 90%|█████████ | 450/500 [01:57<00:11, 4.26it/s] {'loss': 0.0, 'learning_rate': 1e-05, 'epoch': 0.02} 90%|█████████ | 450/500 [01:57<00:11, 4.26it/s][INFO|tokenization_utils_base.py:2428] 2023-12-09 15:29:21,641 >> tokenizer config file saved in output/text-20231209-152643-1e-4/checkpoint-450/tokenizer_config.json [INFO|tokenization_utils_base.py:2437] 2023-12-09 15:29:21,641 >> Special tokens file saved in output/text-20231209-152643-1e-4/checkpoint-450/special_tokens_map.json 90%|█████████ | 451/500 [01:58<00:12, 4.00it/s] {'loss': 0.0, 'learning_rate': 9.800000000000001e-06, 'epoch': 0.02} 90%|█████████ | 451/500 [01:58<00:12, 4.00it/s] 90%|█████████ | 452/500 [01:58<00:11, 4.09it/s] {'loss': 0.0, 'learning_rate': 9.600000000000001e-06, 'epoch': 0.02} 90%|█████████ | 452/500 [01:58<00:11, 4.09it/s] 91%|█████████ | 453/500 [01:58<00:11, 4.17it/s] {'loss': 0.0, 'learning_rate': 9.4e-06, 'epoch': 0.02} 91%|█████████ | 453/500 [01:58<00:11, 4.17it/s] 91%|█████████ | 454/500 [01:58<00:10, 4.19it/s] {'loss': 0.0, 'learning_rate': 9.2e-06, 'epoch': 0.02} 91%|█████████ | 454/500 [01:58<00:10, 4.19it/s] 91%|█████████ | 455/500 [01:59<00:10, 4.20it/s] {'loss': 0.0, 'learning_rate': 9e-06, 'epoch': 0.02} 91%|█████████ | 455/500 [01:59<00:10, 4.20it/s] 91%|█████████ | 456/500 [01:59<00:10, 4.23it/s] {'loss': 0.0, 'learning_rate': 8.8e-06, 'epoch': 0.02} 91%|█████████ | 456/500 [01:59<00:10, 4.23it/s] 91%|█████████▏| 457/500 [01:59<00:10, 4.24it/s] {'loss': 0.0, 'learning_rate': 8.599999999999999e-06, 'epoch': 0.02} 91%|█████████▏| 457/500 [01:59<00:10, 4.24it/s] 92%|█████████▏| 458/500 [01:59<00:09, 4.26it/s] {'loss': 0.0, 'learning_rate': 8.400000000000001e-06, 'epoch': 0.02} 92%|█████████▏| 458/500 [01:59<00:09, 4.26it/s] 92%|█████████▏| 459/500 [02:00<00:09, 4.29it/s] {'loss': 0.0, 'learning_rate': 8.200000000000001e-06, 'epoch': 0.02} 92%|█████████▏| 459/500 [02:00<00:09, 4.29it/s] 92%|█████████▏| 460/500 [02:00<00:09, 4.29it/s] {'loss': 0.0, 'learning_rate': 8.000000000000001e-06, 'epoch': 0.02} 92%|█████████▏| 460/500 [02:00<00:09, 4.29it/s] 92%|█████████▏| 461/500 [02:00<00:09, 4.27it/s] {'loss': 0.0, 'learning_rate': 7.8e-06, 'epoch': 0.02} 92%|█████████▏| 461/500 [02:00<00:09, 4.27it/s] 92%|█████████▏| 462/500 [02:00<00:08, 4.28it/s] {'loss': 0.0, 'learning_rate': 7.6e-06, 'epoch': 0.02} 92%|█████████▏| 462/500 [02:00<00:08, 4.28it/s] 93%|█████████▎| 463/500 [02:00<00:08, 4.27it/s] {'loss': 0.0, 'learning_rate': 7.4e-06, 'epoch': 0.02} 93%|█████████▎| 463/500 [02:00<00:08, 4.27it/s] 93%|█████████▎| 464/500 [02:01<00:08, 4.26it/s] {'loss': 0.0, 'learning_rate': 7.2e-06, 'epoch': 0.02} 93%|█████████▎| 464/500 [02:01<00:08, 4.26it/s] 93%|█████████▎| 465/500 [02:01<00:08, 4.26it/s] {'loss': 0.0, 'learning_rate': 7.000000000000001e-06, 'epoch': 0.02} 93%|█████████▎| 465/500 [02:01<00:08, 4.26it/s] 93%|█████████▎| 466/500 [02:01<00:07, 4.26it/s] {'loss': 0.0, 'learning_rate': 6.800000000000001e-06, 'epoch': 0.02} 93%|█████████▎| 466/500 [02:01<00:07, 4.26it/s] 93%|█████████▎| 467/500 [02:01<00:07, 4.27it/s] {'loss': 0.0, 'learning_rate': 6.6e-06, 'epoch': 0.02} 93%|█████████▎| 467/500 [02:01<00:07, 4.27it/s] 94%|█████████▎| 468/500 [02:02<00:07, 4.26it/s] {'loss': 0.0, 'learning_rate': 6.4000000000000006e-06, 'epoch': 0.02} 94%|█████████▎| 468/500 [02:02<00:07, 4.26it/s] 94%|█████████▍| 469/500 [02:02<00:07, 4.26it/s] {'loss': 0.0, 'learning_rate': 6.2e-06, 'epoch': 0.02} 94%|█████████▍| 469/500 [02:02<00:07, 4.26it/s] 94%|█████████▍| 470/500 [02:02<00:07, 4.26it/s] {'loss': 0.0, 'learning_rate': 6e-06, 'epoch': 0.02} 94%|█████████▍| 470/500 [02:02<00:07, 4.26it/s] 94%|█████████▍| 471/500 [02:02<00:06, 4.26it/s] {'loss': 0.0, 'learning_rate': 5.8e-06, 'epoch': 0.02} 94%|█████████▍| 471/500 [02:02<00:06, 4.26it/s] 94%|█████████▍| 472/500 [02:03<00:06, 4.27it/s] {'loss': 0.0, 'learning_rate': 5.600000000000001e-06, 'epoch': 0.02} 94%|█████████▍| 472/500 [02:03<00:06, 4.27it/s] 95%|█████████▍| 473/500 [02:03<00:06, 4.27it/s] {'loss': 0.0, 'learning_rate': 5.4e-06, 'epoch': 0.02} 95%|█████████▍| 473/500 [02:03<00:06, 4.27it/s] 95%|█████████▍| 474/500 [02:03<00:06, 4.28it/s] {'loss': 0.0, 'learning_rate': 5.2e-06, 'epoch': 0.02} 95%|█████████▍| 474/500 [02:03<00:06, 4.28it/s] 95%|█████████▌| 475/500 [02:03<00:05, 4.27it/s] {'loss': 0.0, 'learning_rate': 5e-06, 'epoch': 0.02} 95%|█████████▌| 475/500 [02:03<00:05, 4.27it/s] 95%|█████████▌| 476/500 [02:04<00:05, 4.27it/s] {'loss': 0.0, 'learning_rate': 4.800000000000001e-06, 'epoch': 0.02} 95%|█████████▌| 476/500 [02:04<00:05, 4.27it/s] 95%|█████████▌| 477/500 [02:04<00:05, 4.27it/s] {'loss': 0.0, 'learning_rate': 4.6e-06, 'epoch': 0.02} 95%|█████████▌| 477/500 [02:04<00:05, 4.27it/s] 96%|█████████▌| 478/500 [02:04<00:05, 4.28it/s] {'loss': 0.0, 'learning_rate': 4.4e-06, 'epoch': 0.02} 96%|█████████▌| 478/500 [02:04<00:05, 4.28it/s] 96%|█████████▌| 479/500 [02:04<00:04, 4.28it/s] {'loss': 0.0, 'learning_rate': 4.2000000000000004e-06, 'epoch': 0.02} 96%|█████████▌| 479/500 [02:04<00:04, 4.28it/s] 96%|█████████▌| 480/500 [02:04<00:04, 4.27it/s] {'loss': 0.0, 'learning_rate': 4.000000000000001e-06, 'epoch': 0.02} 96%|█████████▌| 480/500 [02:04<00:04, 4.27it/s] 96%|█████████▌| 481/500 [02:05<00:04, 4.26it/s] {'loss': 0.0, 'learning_rate': 3.8e-06, 'epoch': 0.02} 96%|█████████▌| 481/500 [02:05<00:04, 4.26it/s] 96%|█████████▋| 482/500 [02:05<00:04, 4.27it/s] {'loss': 0.0, 'learning_rate': 3.6e-06, 'epoch': 0.02} 96%|█████████▋| 482/500 [02:05<00:04, 4.27it/s] 97%|█████████▋| 483/500 [02:05<00:03, 4.27it/s] {'loss': 0.0, 'learning_rate': 3.4000000000000005e-06, 'epoch': 0.02} 97%|█████████▋| 483/500 [02:05<00:03, 4.27it/s] 97%|█████████▋| 484/500 [02:05<00:03, 4.27it/s] {'loss': 0.0, 'learning_rate': 3.2000000000000003e-06, 'epoch': 0.02} 97%|█████████▋| 484/500 [02:05<00:03, 4.27it/s] 97%|█████████▋| 485/500 [02:06<00:03, 4.23it/s] {'loss': 0.0, 'learning_rate': 3e-06, 'epoch': 0.02} 97%|█████████▋| 485/500 [02:06<00:03, 4.23it/s] 97%|█████████▋| 486/500 [02:06<00:03, 4.20it/s] {'loss': 0.0, 'learning_rate': 2.8000000000000003e-06, 'epoch': 0.02} 97%|█████████▋| 486/500 [02:06<00:03, 4.20it/s] 97%|█████████▋| 487/500 [02:06<00:03, 4.20it/s] {'loss': 0.0, 'learning_rate': 2.6e-06, 'epoch': 0.02} 97%|█████████▋| 487/500 [02:06<00:03, 4.20it/s] 98%|█████████▊| 488/500 [02:06<00:02, 4.20it/s] {'loss': 0.0, 'learning_rate': 2.4000000000000003e-06, 'epoch': 0.02} 98%|█████████▊| 488/500 [02:06<00:02, 4.20it/s] 98%|█████████▊| 489/500 [02:07<00:02, 4.19it/s] {'loss': 0.0, 'learning_rate': 2.2e-06, 'epoch': 0.02} 98%|█████████▊| 489/500 [02:07<00:02, 4.19it/s] 98%|█████████▊| 490/500 [02:07<00:02, 4.20it/s] {'loss': 0.0, 'learning_rate': 2.0000000000000003e-06, 'epoch': 0.02} 98%|█████████▊| 490/500 [02:07<00:02, 4.20it/s] 98%|█████████▊| 491/500 [02:07<00:02, 4.18it/s] {'loss': 0.0, 'learning_rate': 1.8e-06, 'epoch': 0.02} 98%|█████████▊| 491/500 [02:07<00:02, 4.18it/s] 98%|█████████▊| 492/500 [02:07<00:01, 4.18it/s] {'loss': 0.0, 'learning_rate': 1.6000000000000001e-06, 'epoch': 0.02} 98%|█████████▊| 492/500 [02:07<00:01, 4.18it/s] 99%|█████████▊| 493/500 [02:08<00:01, 4.19it/s] {'loss': 0.0, 'learning_rate': 1.4000000000000001e-06, 'epoch': 0.02} 99%|█████████▊| 493/500 [02:08<00:01, 4.19it/s] 99%|█████████▉| 494/500 [02:08<00:01, 4.18it/s] {'loss': 0.0, 'learning_rate': 1.2000000000000002e-06, 'epoch': 0.02} 99%|█████████▉| 494/500 [02:08<00:01, 4.18it/s] 99%|█████████▉| 495/500 [02:08<00:01, 4.18it/s] {'loss': 0.0, 'learning_rate': 1.0000000000000002e-06, 'epoch': 0.02} 99%|█████████▉| 495/500 [02:08<00:01, 4.18it/s] 99%|█████████▉| 496/500 [02:08<00:00, 4.17it/s] {'loss': 0.0, 'learning_rate': 8.000000000000001e-07, 'epoch': 0.02} 99%|█████████▉| 496/500 [02:08<00:00, 4.17it/s] 99%|█████████▉| 497/500 [02:09<00:00, 4.19it/s] {'loss': 0.0, 'learning_rate': 6.000000000000001e-07, 'epoch': 0.02} 99%|█████████▉| 497/500 [02:09<00:00, 4.19it/s] 100%|█████████▉| 498/500 [02:09<00:00, 4.20it/s] {'loss': 0.0, 'learning_rate': 4.0000000000000003e-07, 'epoch': 0.02} 100%|█████████▉| 498/500 [02:09<00:00, 4.20it/s] 100%|█████████▉| 499/500 [02:09<00:00, 4.19it/s] {'loss': 0.0, 'learning_rate': 2.0000000000000002e-07, 'epoch': 0.02} 100%|█████████▉| 499/500 [02:09<00:00, 4.19it/s] 100%|██████████| 500/500 [02:09<00:00, 4.20it/s] {'loss': 0.0, 'learning_rate': 0.0, 'epoch': 0.02} 100%|██████████| 500/500 [02:09<00:00, 4.20it/s][INFO|tokenization_utils_base.py:2428] 2023-12-09 15:29:33,483 >> tokenizer config file saved in output/text-20231209-152643-1e-4/checkpoint-500/tokenizer_config.json [INFO|tokenization_utils_base.py:2437] 2023-12-09 15:29:33,483 >> Special tokens file saved in output/text-20231209-152643-1e-4/checkpoint-500/special_tokens_map.json [INFO|trainer.py:1955] 2023-12-09 15:29:33,536 >> Training completed. Do not forget to share your model on huggingface.co/models =) {'train_runtime': 130.5539, 'train_samples_per_second': 7.66, 'train_steps_per_second': 3.83, 'train_loss': 0.00219873046875, 'epoch': 0.02} 100%|██████████| 500/500 [02:09<00:00, 4.20it/s] 100%|██████████| 500/500 [02:09<00:00, 3.85it/s] [INFO|tokenization_utils_base.py:2428] 2023-12-09 15:29:33,559 >> tokenizer config file saved in output/text-20231209-152643-1e-4/tokenizer_config.json [INFO|tokenization_utils_base.py:2437] 2023-12-09 15:29:33,559 >> Special tokens file saved in output/text-20231209-152643-1e-4/special_tokens_map.json