seamoon2333's picture
Upload 7 files
fb52d11
raw
history blame
222 kB
[2023-12-09 15:26:45,478] torch.distributed.run: [WARNING] master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.
[2023-12-09 15:26:45,478] torch.distributed.run: [WARNING]
[2023-12-09 15:26:45,478] torch.distributed.run: [WARNING] *****************************************
[2023-12-09 15:26:45,478] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2023-12-09 15:26:45,478] torch.distributed.run: [WARNING] *****************************************
12/09/2023 15:26:57 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False
12/09/2023 15:26:57 - INFO - __main__ - Training/evaluation parameters Seq2SeqTrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=False,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
generation_config=None,
generation_max_length=None,
generation_num_beams=None,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
gradient_checkpointing_kwargs=None,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
include_tokens_per_second=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=0.0001,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=output/text-20231209-152643-1e-4/runs/Dec09_15-26-54_lily-gpu07,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=1.0,
logging_strategy=steps,
lr_scheduler_type=linear,
max_grad_norm=1.0,
max_steps=500,
metric_for_best_model=None,
mp_parameters=,
neftune_noise_alpha=None,
no_cuda=False,
num_train_epochs=3.0,
optim=adamw_torch,
optim_args=None,
output_dir=output/text-20231209-152643-1e-4,
overwrite_output_dir=False,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=1,
predict_with_generate=False,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=[],
resume_from_checkpoint=None,
run_name=output/text-20231209-152643-1e-4,
save_on_each_node=False,
save_safetensors=True,
save_steps=50,
save_strategy=steps,
save_total_limit=None,
seed=42,
skip_memory_metrics=True,
sortish_sampler=False,
split_batches=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
)
[INFO|tokenization_utils_base.py:2022] 2023-12-09 15:26:58,100 >> loading file tokenizer.model from cache at /home/haiyue/.cache/huggingface/hub/models--THUDM--chatglm3-6b-base/snapshots/f91a1de587fdc692073367198e65369669a0b49d/tokenizer.model
[INFO|tokenization_utils_base.py:2022] 2023-12-09 15:26:58,100 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2022] 2023-12-09 15:26:58,100 >> loading file special_tokens_map.json from cache at None
[INFO|tokenization_utils_base.py:2022] 2023-12-09 15:26:58,100 >> loading file tokenizer_config.json from cache at /home/haiyue/.cache/huggingface/hub/models--THUDM--chatglm3-6b-base/snapshots/f91a1de587fdc692073367198e65369669a0b49d/tokenizer_config.json
[INFO|tokenization_utils_base.py:2022] 2023-12-09 15:26:58,100 >> loading file tokenizer.json from cache at None
[INFO|configuration_utils.py:717] 2023-12-09 15:26:58,534 >> loading configuration file config.json from cache at /home/haiyue/.cache/huggingface/hub/models--THUDM--chatglm3-6b-base/snapshots/f91a1de587fdc692073367198e65369669a0b49d/config.json
12/09/2023 15:26:58 - WARNING - __main__ - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: False
[INFO|configuration_utils.py:717] 2023-12-09 15:26:58,785 >> loading configuration file config.json from cache at /home/haiyue/.cache/huggingface/hub/models--THUDM--chatglm3-6b-base/snapshots/f91a1de587fdc692073367198e65369669a0b49d/config.json
[INFO|configuration_utils.py:777] 2023-12-09 15:26:58,786 >> Model config ChatGLMConfig {
"_name_or_path": "THUDM/chatglm3-6b-base",
"add_bias_linear": false,
"add_qkv_bias": true,
"apply_query_key_layer_scaling": true,
"apply_residual_connection_post_layernorm": false,
"architectures": [
"ChatGLMModel"
],
"attention_dropout": 0.0,
"attention_softmax_in_fp32": true,
"auto_map": {
"AutoConfig": "THUDM/chatglm3-6b-base--configuration_chatglm.ChatGLMConfig",
"AutoModel": "THUDM/chatglm3-6b-base--modeling_chatglm.ChatGLMForConditionalGeneration",
"AutoModelForCausalLM": "THUDM/chatglm3-6b-base--modeling_chatglm.ChatGLMForConditionalGeneration",
"AutoModelForSeq2SeqLM": "THUDM/chatglm3-6b-base--modeling_chatglm.ChatGLMForConditionalGeneration",
"AutoModelForSequenceClassification": "THUDM/chatglm3-6b-base--modeling_chatglm.ChatGLMForSequenceClassification"
},
"bias_dropout_fusion": true,
"classifier_dropout": null,
"eos_token_id": 2,
"ffn_hidden_size": 13696,
"fp32_residual_connection": false,
"hidden_dropout": 0.0,
"hidden_size": 4096,
"kv_channels": 128,
"layernorm_epsilon": 1e-05,
"model_type": "chatglm",
"multi_query_attention": true,
"multi_query_group_num": 2,
"num_attention_heads": 32,
"num_layers": 28,
"original_rope": true,
"pad_token_id": 0,
"padded_vocab_size": 65024,
"post_layer_norm": true,
"pre_seq_len": null,
"prefix_projection": false,
"quantization_bit": 0,
"rmsnorm": true,
"seq_length": 32768,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.35.2",
"use_cache": true,
"vocab_size": 65024
}
[INFO|modeling_utils.py:3121] 2023-12-09 15:27:00,104 >> loading weights file pytorch_model.bin from cache at /home/haiyue/.cache/huggingface/hub/models--THUDM--chatglm3-6b-base/snapshots/f91a1de587fdc692073367198e65369669a0b49d/pytorch_model.bin.index.json
[INFO|configuration_utils.py:791] 2023-12-09 15:27:00,113 >> Generate config GenerationConfig {
"eos_token_id": 2,
"pad_token_id": 0
}
Loading checkpoint shards: 0%| | 0/7 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/7 [00:00<?, ?it/s] Loading checkpoint shards: 14%|β–ˆβ– | 1/7 [00:02<00:12, 2.14s/it] Loading checkpoint shards: 14%|β–ˆβ– | 1/7 [00:02<00:12, 2.17s/it] Loading checkpoint shards: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:04<00:10, 2.18s/it] Loading checkpoint shards: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:04<00:11, 2.20s/it] Loading checkpoint shards: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:06<00:08, 2.14s/it] Loading checkpoint shards: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:06<00:09, 2.26s/it] Loading checkpoint shards: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:08<00:06, 2.11s/it] Loading checkpoint shards: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:09<00:06, 2.31s/it] Loading checkpoint shards: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:10<00:04, 2.24s/it] Loading checkpoint shards: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:11<00:04, 2.39s/it] Loading checkpoint shards: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:13<00:02, 2.33s/it] Loading checkpoint shards: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:13<00:02, 2.37s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:15<00:00, 2.08s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:15<00:00, 2.15s/it]
[INFO|modeling_utils.py:3950] 2023-12-09 15:27:15,240 >> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration.
[INFO|modeling_utils.py:3958] 2023-12-09 15:27:15,240 >> All the weights of ChatGLMForConditionalGeneration were initialized from the model checkpoint at THUDM/chatglm3-6b-base.
If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMForConditionalGeneration for predictions without further training.
[INFO|modeling_utils.py:3525] 2023-12-09 15:27:15,493 >> Generation config file not found, using a generation config created from the model config.
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:15<00:00, 2.05s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:15<00:00, 2.19s/it]
Train dataset size: 52002
Sanity Check >>>>>>>>>>>>>
'[gMASK]': 64790 -> -100
'sop': 64792 -> -100
'Instruction': 29101 -> -100
':': 30954 -> -100
'Give': 10465 -> -100
'three': 1194 -> -100
'tips': 6639 -> -100
'for': 332 -> -100
'staying': 10061 -> -100
'healthy': 4651 -> -100
'.': 30930 -> -100
'\n': 13 -> -100
'An': 4244 -> -100
'sw': 1902 -> -100
'er': 266 -> -100
':': 30954 -> -100
'': 30910 -> -100
'': 30910 -> 30910
'1': 30939 -> 30939
'.': 30930 -> 30930
'E': 30950 -> 30950
'at': 269 -> 269
'a': 260 -> 260
'balanced': 12949 -> 12949
'diet': 5546 -> 5546
'and': 293 -> 293
'make': 794 -> 794
'sure': 1506 -> 1506
'to': 289 -> 289
'include': 1860 -> 1860
'plenty': 5765 -> 5765
'of': 290 -> 290
'fruits': 13665 -> 13665
'and': 293 -> 293
'vegetables': 11567 -> 11567
'.': 30930 -> 30930
'': 30910 -> 30910
'\n': 13 -> 13
'2': 30943 -> 30943
'.': 30930 -> 30930
'Exercise': 23340 -> 23340
'regularly': 7414 -> 7414
'to': 289 -> 289
'keep': 1407 -> 1407
'your': 475 -> 475
'body': 1934 -> 1934
'active': 4047 -> 4047
'and': 293 -> 293
'strong': 2034 -> 2034
'.': 30930 -> 30930
'': 30910 -> 30910
'\n': 13 -> 13
'3': 30966 -> 30966
'.': 30930 -> 30930
'Get': 3286 -> 3286
'enough': 1775 -> 1775
'sleep': 4039 -> 4039
'and': 293 -> 293
'maintain': 3165 -> 3165
'a': 260 -> 260
'consistent': 7096 -> 7096
'sleep': 4039 -> 4039
'schedule': 5821 -> 5821
'.': 30930 -> 30930
'': 2 -> 2
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
<<<<<<<<<<<<< Sanity Check
Train dataset size: 52002
Sanity Check >>>>>>>>>>>>>
'[gMASK]': 64790 -> -100
'sop': 64792 -> -100
'Instruction': 29101 -> -100
':': 30954 -> -100
'Give': 10465 -> -100
'three': 1194 -> -100
'tips': 6639 -> -100
'for': 332 -> -100
'staying': 10061 -> -100
'healthy': 4651 -> -100
'.': 30930 -> -100
'\n': 13 -> -100
'An': 4244 -> -100
'sw': 1902 -> -100
'er': 266 -> -100
':': 30954 -> -100
'': 30910 -> -100
'': 30910 -> 30910
'1': 30939 -> 30939
'.': 30930 -> 30930
'E': 30950 -> 30950
'at': 269 -> 269
'a': 260 -> 260
'balanced': 12949 -> 12949
'diet': 5546 -> 5546
'and': 293 -> 293
'make': 794 -> 794
'sure': 1506 -> 1506
'to': 289 -> 289
'include': 1860 -> 1860
'plenty': 5765 -> 5765
'of': 290 -> 290
'fruits': 13665 -> 13665
'and': 293 -> 293
'vegetables': 11567 -> 11567
'.': 30930 -> 30930
'': 30910 -> 30910
'\n': 13 -> 13
'2': 30943 -> 30943
'.': 30930 -> 30930
'Exercise': 23340 -> 23340
'regularly': 7414 -> 7414
'to': 289 -> 289
'keep': 1407 -> 1407
'your': 475 -> 475
'body': 1934 -> 1934
'active': 4047 -> 4047
'and': 293 -> 293
'strong': 2034 -> 2034
'.': 30930 -> 30930
'': 30910 -> 30910
'\n': 13 -> 13
'3': 30966 -> 30966
'.': 30930 -> 30930
'Get': 3286 -> 3286
'enough': 1775 -> 1775
'sleep': 4039 -> 4039
'and': 293 -> 293
'maintain': 3165 -> 3165
'a': 260 -> 260
'consistent': 7096 -> 7096
'sleep': 4039 -> 4039
'schedule': 5821 -> 5821
'.': 30930 -> 30930
'': 2 -> 2
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
<<<<<<<<<<<<< Sanity Check
[INFO|trainer.py:544] 2023-12-09 15:27:20,460 >> max_steps is given, it will override any value given in num_train_epochs
[INFO|trainer.py:1723] 2023-12-09 15:27:22,980 >> ***** Running training *****
[INFO|trainer.py:1724] 2023-12-09 15:27:22,981 >> Num examples = 52,002
[INFO|trainer.py:1725] 2023-12-09 15:27:22,981 >> Num Epochs = 1
[INFO|trainer.py:1726] 2023-12-09 15:27:22,981 >> Instantaneous batch size per device = 1
[INFO|trainer.py:1729] 2023-12-09 15:27:22,981 >> Total train batch size (w. parallel, distributed & accumulation) = 2
[INFO|trainer.py:1730] 2023-12-09 15:27:22,981 >> Gradient Accumulation steps = 1
[INFO|trainer.py:1731] 2023-12-09 15:27:22,981 >> Total optimization steps = 500
[INFO|trainer.py:1732] 2023-12-09 15:27:22,983 >> Number of trainable parameters = 1,949,696
0%| | 0/500 [00:00<?, ?it/s][W reducer.cpp:1346] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1346] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
0%| | 1/500 [00:02<21:33, 2.59s/it] {'loss': 1.0994, 'learning_rate': 9.98e-05, 'epoch': 0.0}
0%| | 1/500 [00:02<21:33, 2.59s/it] 0%| | 2/500 [00:02<09:59, 1.20s/it] {'loss': 0.0, 'learning_rate': 9.960000000000001e-05, 'epoch': 0.0}
0%| | 2/500 [00:02<09:59, 1.20s/it] 1%| | 3/500 [00:03<06:17, 1.32it/s] {'loss': 0.0, 'learning_rate': 9.94e-05, 'epoch': 0.0}
1%| | 3/500 [00:03<06:17, 1.32it/s] 1%| | 4/500 [00:03<04:33, 1.82it/s] {'loss': 0.0, 'learning_rate': 9.92e-05, 'epoch': 0.0}
1%| | 4/500 [00:03<04:33, 1.82it/s] 1%| | 5/500 [00:03<03:35, 2.30it/s] {'loss': 0.0, 'learning_rate': 9.900000000000001e-05, 'epoch': 0.0}
1%| | 5/500 [00:03<03:35, 2.30it/s] 1%| | 6/500 [00:03<03:00, 2.74it/s] {'loss': 0.0, 'learning_rate': 9.88e-05, 'epoch': 0.0}
1%| | 6/500 [00:03<03:00, 2.74it/s] 1%|▏ | 7/500 [00:03<02:37, 3.13it/s] {'loss': 0.0, 'learning_rate': 9.86e-05, 'epoch': 0.0}
1%|▏ | 7/500 [00:03<02:37, 3.13it/s] 2%|▏ | 8/500 [00:04<02:23, 3.44it/s] {'loss': 0.0, 'learning_rate': 9.84e-05, 'epoch': 0.0}
2%|▏ | 8/500 [00:04<02:23, 3.44it/s] 2%|▏ | 9/500 [00:04<02:14, 3.66it/s] {'loss': 0.0, 'learning_rate': 9.82e-05, 'epoch': 0.0}
2%|▏ | 9/500 [00:04<02:14, 3.66it/s] 2%|▏ | 10/500 [00:04<02:07, 3.83it/s] {'loss': 0.0, 'learning_rate': 9.8e-05, 'epoch': 0.0}
2%|▏ | 10/500 [00:04<02:07, 3.83it/s] 2%|▏ | 11/500 [00:04<02:02, 3.99it/s] {'loss': 0.0, 'learning_rate': 9.78e-05, 'epoch': 0.0}
2%|▏ | 11/500 [00:04<02:02, 3.99it/s] 2%|▏ | 12/500 [00:05<01:59, 4.07it/s] {'loss': 0.0, 'learning_rate': 9.76e-05, 'epoch': 0.0}
2%|▏ | 12/500 [00:05<01:59, 4.07it/s] 3%|β–Ž | 13/500 [00:05<01:57, 4.15it/s] {'loss': 0.0, 'learning_rate': 9.74e-05, 'epoch': 0.0}
3%|β–Ž | 13/500 [00:05<01:57, 4.15it/s] 3%|β–Ž | 14/500 [00:05<01:55, 4.22it/s] {'loss': 0.0, 'learning_rate': 9.72e-05, 'epoch': 0.0}
3%|β–Ž | 14/500 [00:05<01:55, 4.22it/s] 3%|β–Ž | 15/500 [00:05<01:54, 4.23it/s] {'loss': 0.0, 'learning_rate': 9.7e-05, 'epoch': 0.0}
3%|β–Ž | 15/500 [00:05<01:54, 4.23it/s] 3%|β–Ž | 16/500 [00:06<01:54, 4.24it/s] {'loss': 0.0, 'learning_rate': 9.680000000000001e-05, 'epoch': 0.0}
3%|β–Ž | 16/500 [00:06<01:54, 4.24it/s] 3%|β–Ž | 17/500 [00:06<01:53, 4.25it/s] {'loss': 0.0, 'learning_rate': 9.66e-05, 'epoch': 0.0}
3%|β–Ž | 17/500 [00:06<01:53, 4.25it/s] 4%|β–Ž | 18/500 [00:06<01:53, 4.26it/s] {'loss': 0.0, 'learning_rate': 9.64e-05, 'epoch': 0.0}
4%|β–Ž | 18/500 [00:06<01:53, 4.26it/s] 4%|▍ | 19/500 [00:06<01:52, 4.27it/s] {'loss': 0.0, 'learning_rate': 9.620000000000001e-05, 'epoch': 0.0}
4%|▍ | 19/500 [00:06<01:52, 4.27it/s] 4%|▍ | 20/500 [00:06<01:51, 4.30it/s] {'loss': 0.0, 'learning_rate': 9.6e-05, 'epoch': 0.0}
4%|▍ | 20/500 [00:06<01:51, 4.30it/s] 4%|▍ | 21/500 [00:07<01:51, 4.29it/s] {'loss': 0.0, 'learning_rate': 9.58e-05, 'epoch': 0.0}
4%|▍ | 21/500 [00:07<01:51, 4.29it/s] 4%|▍ | 22/500 [00:07<01:51, 4.27it/s] {'loss': 0.0, 'learning_rate': 9.56e-05, 'epoch': 0.0}
4%|▍ | 22/500 [00:07<01:51, 4.27it/s] 5%|▍ | 23/500 [00:07<01:51, 4.28it/s] {'loss': 0.0, 'learning_rate': 9.54e-05, 'epoch': 0.0}
5%|▍ | 23/500 [00:07<01:51, 4.28it/s] 5%|▍ | 24/500 [00:07<01:51, 4.28it/s] {'loss': 0.0, 'learning_rate': 9.52e-05, 'epoch': 0.0}
5%|▍ | 24/500 [00:07<01:51, 4.28it/s] 5%|β–Œ | 25/500 [00:08<01:50, 4.30it/s] {'loss': 0.0, 'learning_rate': 9.5e-05, 'epoch': 0.0}
5%|β–Œ | 25/500 [00:08<01:50, 4.30it/s] 5%|β–Œ | 26/500 [00:08<01:49, 4.31it/s] {'loss': 0.0, 'learning_rate': 9.48e-05, 'epoch': 0.0}
5%|β–Œ | 26/500 [00:08<01:49, 4.31it/s] 5%|β–Œ | 27/500 [00:08<01:49, 4.32it/s] {'loss': 0.0, 'learning_rate': 9.46e-05, 'epoch': 0.0}
5%|β–Œ | 27/500 [00:08<01:49, 4.32it/s] 6%|β–Œ | 28/500 [00:08<01:48, 4.34it/s] {'loss': 0.0, 'learning_rate': 9.44e-05, 'epoch': 0.0}
6%|β–Œ | 28/500 [00:08<01:48, 4.34it/s] 6%|β–Œ | 29/500 [00:09<01:48, 4.32it/s] {'loss': 0.0, 'learning_rate': 9.42e-05, 'epoch': 0.0}
6%|β–Œ | 29/500 [00:09<01:48, 4.32it/s] 6%|β–Œ | 30/500 [00:09<01:48, 4.33it/s] {'loss': 0.0, 'learning_rate': 9.4e-05, 'epoch': 0.0}
6%|β–Œ | 30/500 [00:09<01:48, 4.33it/s] 6%|β–Œ | 31/500 [00:09<01:58, 3.97it/s] {'loss': 0.0, 'learning_rate': 9.38e-05, 'epoch': 0.0}
6%|β–Œ | 31/500 [00:09<01:58, 3.97it/s] 6%|β–‹ | 32/500 [00:09<02:11, 3.57it/s] {'loss': 0.0, 'learning_rate': 9.360000000000001e-05, 'epoch': 0.0}
6%|β–‹ | 32/500 [00:09<02:11, 3.57it/s] 7%|β–‹ | 33/500 [00:10<02:07, 3.66it/s] {'loss': 0.0, 'learning_rate': 9.340000000000001e-05, 'epoch': 0.0}
7%|β–‹ | 33/500 [00:10<02:07, 3.66it/s] 7%|β–‹ | 34/500 [00:10<02:19, 3.34it/s] {'loss': 0.0, 'learning_rate': 9.320000000000002e-05, 'epoch': 0.0}
7%|β–‹ | 34/500 [00:10<02:19, 3.34it/s] 7%|β–‹ | 35/500 [00:10<02:24, 3.22it/s] {'loss': 0.0, 'learning_rate': 9.300000000000001e-05, 'epoch': 0.0}
7%|β–‹ | 35/500 [00:10<02:24, 3.22it/s] 7%|β–‹ | 36/500 [00:11<02:12, 3.50it/s] {'loss': 0.0, 'learning_rate': 9.28e-05, 'epoch': 0.0}
7%|β–‹ | 36/500 [00:11<02:12, 3.50it/s] 7%|β–‹ | 37/500 [00:11<02:05, 3.70it/s] {'loss': 0.0, 'learning_rate': 9.260000000000001e-05, 'epoch': 0.0}
7%|β–‹ | 37/500 [00:11<02:05, 3.70it/s] 8%|β–Š | 38/500 [00:11<02:00, 3.85it/s] {'loss': 0.0, 'learning_rate': 9.240000000000001e-05, 'epoch': 0.0}
8%|β–Š | 38/500 [00:11<02:00, 3.85it/s] 8%|β–Š | 39/500 [00:11<01:56, 3.96it/s] {'loss': 0.0, 'learning_rate': 9.22e-05, 'epoch': 0.0}
8%|β–Š | 39/500 [00:11<01:56, 3.96it/s] 8%|β–Š | 40/500 [00:12<01:53, 4.04it/s] {'loss': 0.0, 'learning_rate': 9.200000000000001e-05, 'epoch': 0.0}
8%|β–Š | 40/500 [00:12<01:53, 4.04it/s] 8%|β–Š | 41/500 [00:12<01:52, 4.09it/s] {'loss': 0.0, 'learning_rate': 9.180000000000001e-05, 'epoch': 0.0}
8%|β–Š | 41/500 [00:12<01:52, 4.09it/s] 8%|β–Š | 42/500 [00:12<01:50, 4.15it/s] {'loss': 0.0, 'learning_rate': 9.16e-05, 'epoch': 0.0}
8%|β–Š | 42/500 [00:12<01:50, 4.15it/s] 9%|β–Š | 43/500 [00:12<01:48, 4.20it/s] {'loss': 0.0, 'learning_rate': 9.140000000000001e-05, 'epoch': 0.0}
9%|β–Š | 43/500 [00:12<01:48, 4.20it/s] 9%|β–‰ | 44/500 [00:13<01:48, 4.21it/s] {'loss': 0.0, 'learning_rate': 9.120000000000001e-05, 'epoch': 0.0}
9%|β–‰ | 44/500 [00:13<01:48, 4.21it/s] 9%|β–‰ | 45/500 [00:13<01:47, 4.25it/s] {'loss': 0.0, 'learning_rate': 9.1e-05, 'epoch': 0.0}
9%|β–‰ | 45/500 [00:13<01:47, 4.25it/s] 9%|β–‰ | 46/500 [00:13<01:45, 4.30it/s] {'loss': 0.0, 'learning_rate': 9.080000000000001e-05, 'epoch': 0.0}
9%|β–‰ | 46/500 [00:13<01:45, 4.30it/s] 9%|β–‰ | 47/500 [00:13<01:44, 4.32it/s] {'loss': 0.0, 'learning_rate': 9.06e-05, 'epoch': 0.0}
9%|β–‰ | 47/500 [00:13<01:44, 4.32it/s] 10%|β–‰ | 48/500 [00:13<01:44, 4.31it/s] {'loss': 0.0, 'learning_rate': 9.04e-05, 'epoch': 0.0}
10%|β–‰ | 48/500 [00:13<01:44, 4.31it/s] 10%|β–‰ | 49/500 [00:14<01:44, 4.33it/s] {'loss': 0.0, 'learning_rate': 9.020000000000001e-05, 'epoch': 0.0}
10%|β–‰ | 49/500 [00:14<01:44, 4.33it/s] 10%|β–ˆ | 50/500 [00:14<01:44, 4.31it/s] {'loss': 0.0, 'learning_rate': 9e-05, 'epoch': 0.0}
10%|β–ˆ | 50/500 [00:14<01:44, 4.31it/s][INFO|tokenization_utils_base.py:2428] 2023-12-09 15:27:38,144 >> tokenizer config file saved in output/text-20231209-152643-1e-4/checkpoint-50/tokenizer_config.json
[INFO|tokenization_utils_base.py:2437] 2023-12-09 15:27:38,144 >> Special tokens file saved in output/text-20231209-152643-1e-4/checkpoint-50/special_tokens_map.json
10%|β–ˆ | 51/500 [00:14<01:51, 4.03it/s] {'loss': 0.0, 'learning_rate': 8.98e-05, 'epoch': 0.0}
10%|β–ˆ | 51/500 [00:14<01:51, 4.03it/s] 10%|β–ˆ | 52/500 [00:14<01:49, 4.08it/s] {'loss': 0.0, 'learning_rate': 8.960000000000001e-05, 'epoch': 0.0}
10%|β–ˆ | 52/500 [00:14<01:49, 4.08it/s] 11%|β–ˆ | 53/500 [00:15<01:47, 4.14it/s] {'loss': 0.0, 'learning_rate': 8.94e-05, 'epoch': 0.0}
11%|β–ˆ | 53/500 [00:15<01:47, 4.14it/s] 11%|β–ˆ | 54/500 [00:15<01:46, 4.20it/s] {'loss': 0.0, 'learning_rate': 8.92e-05, 'epoch': 0.0}
11%|β–ˆ | 54/500 [00:15<01:46, 4.20it/s] 11%|β–ˆ | 55/500 [00:15<01:45, 4.24it/s] {'loss': 0.0, 'learning_rate': 8.900000000000001e-05, 'epoch': 0.0}
11%|β–ˆ | 55/500 [00:15<01:45, 4.24it/s] 11%|β–ˆ | 56/500 [00:15<01:43, 4.27it/s] {'loss': 0.0, 'learning_rate': 8.88e-05, 'epoch': 0.0}
11%|β–ˆ | 56/500 [00:15<01:43, 4.27it/s] 11%|β–ˆβ– | 57/500 [00:16<01:43, 4.29it/s] {'loss': 0.0, 'learning_rate': 8.86e-05, 'epoch': 0.0}
11%|β–ˆβ– | 57/500 [00:16<01:43, 4.29it/s] 12%|β–ˆβ– | 58/500 [00:16<01:43, 4.27it/s] {'loss': 0.0, 'learning_rate': 8.840000000000001e-05, 'epoch': 0.0}
12%|β–ˆβ– | 58/500 [00:16<01:43, 4.27it/s] 12%|β–ˆβ– | 59/500 [00:16<01:43, 4.28it/s] {'loss': 0.0, 'learning_rate': 8.82e-05, 'epoch': 0.0}
12%|β–ˆβ– | 59/500 [00:16<01:43, 4.28it/s] 12%|β–ˆβ– | 60/500 [00:16<01:42, 4.29it/s] {'loss': 0.0, 'learning_rate': 8.800000000000001e-05, 'epoch': 0.0}
12%|β–ˆβ– | 60/500 [00:16<01:42, 4.29it/s] 12%|β–ˆβ– | 61/500 [00:17<01:41, 4.31it/s] {'loss': 0.0, 'learning_rate': 8.78e-05, 'epoch': 0.0}
12%|β–ˆβ– | 61/500 [00:17<01:41, 4.31it/s] 12%|β–ˆβ– | 62/500 [00:17<01:41, 4.31it/s] {'loss': 0.0, 'learning_rate': 8.76e-05, 'epoch': 0.0}
12%|β–ˆβ– | 62/500 [00:17<01:41, 4.31it/s] 13%|β–ˆβ–Ž | 63/500 [00:17<01:41, 4.31it/s] {'loss': 0.0, 'learning_rate': 8.740000000000001e-05, 'epoch': 0.0}
13%|β–ˆβ–Ž | 63/500 [00:17<01:41, 4.31it/s] 13%|β–ˆβ–Ž | 64/500 [00:17<01:40, 4.32it/s] {'loss': 0.0, 'learning_rate': 8.72e-05, 'epoch': 0.0}
13%|β–ˆβ–Ž | 64/500 [00:17<01:40, 4.32it/s] 13%|β–ˆβ–Ž | 65/500 [00:17<01:40, 4.32it/s] {'loss': 0.0, 'learning_rate': 8.7e-05, 'epoch': 0.0}
13%|β–ˆβ–Ž | 65/500 [00:17<01:40, 4.32it/s] 13%|β–ˆβ–Ž | 66/500 [00:18<01:40, 4.32it/s] {'loss': 0.0, 'learning_rate': 8.680000000000001e-05, 'epoch': 0.0}
13%|β–ˆβ–Ž | 66/500 [00:18<01:40, 4.32it/s] 13%|β–ˆβ–Ž | 67/500 [00:18<01:40, 4.32it/s] {'loss': 0.0, 'learning_rate': 8.66e-05, 'epoch': 0.0}
13%|β–ˆβ–Ž | 67/500 [00:18<01:40, 4.32it/s] 14%|β–ˆβ–Ž | 68/500 [00:18<01:39, 4.34it/s] {'loss': 0.0, 'learning_rate': 8.64e-05, 'epoch': 0.0}
14%|β–ˆβ–Ž | 68/500 [00:18<01:39, 4.34it/s] 14%|β–ˆβ– | 69/500 [00:18<01:39, 4.32it/s] {'loss': 0.0, 'learning_rate': 8.620000000000001e-05, 'epoch': 0.0}
14%|β–ˆβ– | 69/500 [00:18<01:39, 4.32it/s] 14%|β–ˆβ– | 70/500 [00:19<01:39, 4.33it/s] {'loss': 0.0, 'learning_rate': 8.6e-05, 'epoch': 0.0}
14%|β–ˆβ– | 70/500 [00:19<01:39, 4.33it/s] 14%|β–ˆβ– | 71/500 [00:19<01:38, 4.34it/s] {'loss': 0.0, 'learning_rate': 8.58e-05, 'epoch': 0.0}
14%|β–ˆβ– | 71/500 [00:19<01:38, 4.34it/s] 14%|β–ˆβ– | 72/500 [00:19<01:38, 4.33it/s] {'loss': 0.0, 'learning_rate': 8.560000000000001e-05, 'epoch': 0.0}
14%|β–ˆβ– | 72/500 [00:19<01:38, 4.33it/s] 15%|β–ˆβ– | 73/500 [00:19<01:38, 4.35it/s] {'loss': 0.0, 'learning_rate': 8.54e-05, 'epoch': 0.0}
15%|β–ˆβ– | 73/500 [00:19<01:38, 4.35it/s] 15%|β–ˆβ– | 74/500 [00:20<01:38, 4.33it/s] {'loss': 0.0, 'learning_rate': 8.52e-05, 'epoch': 0.0}
15%|β–ˆβ– | 74/500 [00:20<01:38, 4.33it/s] 15%|β–ˆβ–Œ | 75/500 [00:20<01:38, 4.32it/s] {'loss': 0.0, 'learning_rate': 8.5e-05, 'epoch': 0.0}
15%|β–ˆβ–Œ | 75/500 [00:20<01:38, 4.32it/s] 15%|β–ˆβ–Œ | 76/500 [00:20<01:37, 4.34it/s] {'loss': 0.0, 'learning_rate': 8.48e-05, 'epoch': 0.0}
15%|β–ˆβ–Œ | 76/500 [00:20<01:37, 4.34it/s] 15%|β–ˆβ–Œ | 77/500 [00:20<01:37, 4.33it/s] {'loss': 0.0, 'learning_rate': 8.46e-05, 'epoch': 0.0}
15%|β–ˆβ–Œ | 77/500 [00:20<01:37, 4.33it/s] 16%|β–ˆβ–Œ | 78/500 [00:20<01:37, 4.33it/s] {'loss': 0.0, 'learning_rate': 8.44e-05, 'epoch': 0.0}
16%|β–ˆβ–Œ | 78/500 [00:20<01:37, 4.33it/s] 16%|β–ˆβ–Œ | 79/500 [00:21<01:37, 4.33it/s] {'loss': 0.0, 'learning_rate': 8.42e-05, 'epoch': 0.0}
16%|β–ˆβ–Œ | 79/500 [00:21<01:37, 4.33it/s] 16%|β–ˆβ–Œ | 80/500 [00:21<01:36, 4.33it/s] {'loss': 0.0, 'learning_rate': 8.4e-05, 'epoch': 0.0}
16%|β–ˆβ–Œ | 80/500 [00:21<01:36, 4.33it/s] 16%|β–ˆβ–Œ | 81/500 [00:21<01:36, 4.32it/s] {'loss': 0.0, 'learning_rate': 8.38e-05, 'epoch': 0.0}
16%|β–ˆβ–Œ | 81/500 [00:21<01:36, 4.32it/s] 16%|β–ˆβ–‹ | 82/500 [00:21<01:37, 4.30it/s] {'loss': 0.0, 'learning_rate': 8.36e-05, 'epoch': 0.0}
16%|β–ˆβ–‹ | 82/500 [00:21<01:37, 4.30it/s] 17%|β–ˆβ–‹ | 83/500 [00:22<01:37, 4.29it/s] {'loss': 0.0, 'learning_rate': 8.34e-05, 'epoch': 0.0}
17%|β–ˆβ–‹ | 83/500 [00:22<01:37, 4.29it/s] 17%|β–ˆβ–‹ | 84/500 [00:22<01:36, 4.29it/s] {'loss': 0.0, 'learning_rate': 8.32e-05, 'epoch': 0.0}
17%|β–ˆβ–‹ | 84/500 [00:22<01:36, 4.29it/s] 17%|β–ˆβ–‹ | 85/500 [00:22<01:36, 4.30it/s] {'loss': 0.0, 'learning_rate': 8.3e-05, 'epoch': 0.0}
17%|β–ˆβ–‹ | 85/500 [00:22<01:36, 4.30it/s] 17%|β–ˆβ–‹ | 86/500 [00:22<01:36, 4.30it/s] {'loss': 0.0, 'learning_rate': 8.28e-05, 'epoch': 0.0}
17%|β–ˆβ–‹ | 86/500 [00:22<01:36, 4.30it/s] 17%|β–ˆβ–‹ | 87/500 [00:23<01:36, 4.27it/s] {'loss': 0.0, 'learning_rate': 8.26e-05, 'epoch': 0.0}
17%|β–ˆβ–‹ | 87/500 [00:23<01:36, 4.27it/s] 18%|β–ˆβ–Š | 88/500 [00:23<01:36, 4.29it/s] {'loss': 0.0, 'learning_rate': 8.24e-05, 'epoch': 0.0}
18%|β–ˆβ–Š | 88/500 [00:23<01:36, 4.29it/s] 18%|β–ˆβ–Š | 89/500 [00:23<01:35, 4.31it/s] {'loss': 0.0, 'learning_rate': 8.22e-05, 'epoch': 0.0}
18%|β–ˆβ–Š | 89/500 [00:23<01:35, 4.31it/s] 18%|β–ˆβ–Š | 90/500 [00:23<01:35, 4.31it/s] {'loss': 0.0, 'learning_rate': 8.2e-05, 'epoch': 0.0}
18%|β–ˆβ–Š | 90/500 [00:23<01:35, 4.31it/s] 18%|β–ˆβ–Š | 91/500 [00:23<01:34, 4.32it/s] {'loss': 0.0, 'learning_rate': 8.18e-05, 'epoch': 0.0}
18%|β–ˆβ–Š | 91/500 [00:23<01:34, 4.32it/s] 18%|β–ˆβ–Š | 92/500 [00:24<01:34, 4.32it/s] {'loss': 0.0, 'learning_rate': 8.16e-05, 'epoch': 0.0}
18%|β–ˆβ–Š | 92/500 [00:24<01:34, 4.32it/s] 19%|β–ˆβ–Š | 93/500 [00:24<01:34, 4.30it/s] {'loss': 0.0, 'learning_rate': 8.14e-05, 'epoch': 0.0}
19%|β–ˆβ–Š | 93/500 [00:24<01:34, 4.30it/s] 19%|β–ˆβ–‰ | 94/500 [00:24<01:34, 4.28it/s] {'loss': 0.0, 'learning_rate': 8.120000000000001e-05, 'epoch': 0.0}
19%|β–ˆβ–‰ | 94/500 [00:24<01:34, 4.28it/s] 19%|β–ˆβ–‰ | 95/500 [00:24<01:34, 4.29it/s] {'loss': 0.0, 'learning_rate': 8.1e-05, 'epoch': 0.0}
19%|β–ˆβ–‰ | 95/500 [00:24<01:34, 4.29it/s] 19%|β–ˆβ–‰ | 96/500 [00:25<01:33, 4.30it/s] {'loss': 0.0, 'learning_rate': 8.080000000000001e-05, 'epoch': 0.0}
19%|β–ˆβ–‰ | 96/500 [00:25<01:33, 4.30it/s] 19%|β–ˆβ–‰ | 97/500 [00:25<01:33, 4.33it/s] {'loss': 0.0, 'learning_rate': 8.060000000000001e-05, 'epoch': 0.0}
19%|β–ˆβ–‰ | 97/500 [00:25<01:33, 4.33it/s] 20%|β–ˆβ–‰ | 98/500 [00:25<01:32, 4.33it/s] {'loss': 0.0, 'learning_rate': 8.04e-05, 'epoch': 0.0}
20%|β–ˆβ–‰ | 98/500 [00:25<01:32, 4.33it/s] 20%|β–ˆβ–‰ | 99/500 [00:25<01:32, 4.33it/s] {'loss': 0.0, 'learning_rate': 8.020000000000001e-05, 'epoch': 0.0}
20%|β–ˆβ–‰ | 99/500 [00:25<01:32, 4.33it/s] 20%|β–ˆβ–ˆ | 100/500 [00:26<01:32, 4.34it/s] {'loss': 0.0, 'learning_rate': 8e-05, 'epoch': 0.0}
20%|β–ˆβ–ˆ | 100/500 [00:26<01:32, 4.34it/s][INFO|tokenization_utils_base.py:2428] 2023-12-09 15:27:49,786 >> tokenizer config file saved in output/text-20231209-152643-1e-4/checkpoint-100/tokenizer_config.json
[INFO|tokenization_utils_base.py:2437] 2023-12-09 15:27:49,786 >> Special tokens file saved in output/text-20231209-152643-1e-4/checkpoint-100/special_tokens_map.json
20%|β–ˆβ–ˆ | 101/500 [00:26<01:37, 4.08it/s] {'loss': 0.0, 'learning_rate': 7.98e-05, 'epoch': 0.0}
20%|β–ˆβ–ˆ | 101/500 [00:26<01:37, 4.08it/s] 20%|β–ˆβ–ˆ | 102/500 [00:26<01:35, 4.17it/s] {'loss': 0.0, 'learning_rate': 7.960000000000001e-05, 'epoch': 0.0}
20%|β–ˆβ–ˆ | 102/500 [00:26<01:35, 4.17it/s] 21%|β–ˆβ–ˆ | 103/500 [00:26<01:33, 4.23it/s] {'loss': 0.0, 'learning_rate': 7.94e-05, 'epoch': 0.0}
21%|β–ˆβ–ˆ | 103/500 [00:26<01:33, 4.23it/s] 21%|β–ˆβ–ˆ | 104/500 [00:27<01:33, 4.23it/s] {'loss': 0.0, 'learning_rate': 7.920000000000001e-05, 'epoch': 0.0}
21%|β–ˆβ–ˆ | 104/500 [00:27<01:33, 4.23it/s] 21%|β–ˆβ–ˆ | 105/500 [00:27<01:32, 4.25it/s] {'loss': 0.0, 'learning_rate': 7.900000000000001e-05, 'epoch': 0.0}
21%|β–ˆβ–ˆ | 105/500 [00:27<01:32, 4.25it/s] 21%|β–ˆβ–ˆ | 106/500 [00:27<01:31, 4.29it/s] {'loss': 0.0, 'learning_rate': 7.88e-05, 'epoch': 0.0}
21%|β–ˆβ–ˆ | 106/500 [00:27<01:31, 4.29it/s] 21%|β–ˆβ–ˆβ– | 107/500 [00:27<01:31, 4.30it/s] {'loss': 0.0, 'learning_rate': 7.860000000000001e-05, 'epoch': 0.0}
21%|β–ˆβ–ˆβ– | 107/500 [00:27<01:31, 4.30it/s] 22%|β–ˆβ–ˆβ– | 108/500 [00:27<01:30, 4.31it/s] {'loss': 0.0, 'learning_rate': 7.840000000000001e-05, 'epoch': 0.0}
22%|β–ˆβ–ˆβ– | 108/500 [00:27<01:30, 4.31it/s] 22%|β–ˆβ–ˆβ– | 109/500 [00:28<01:31, 4.29it/s] {'loss': 0.0, 'learning_rate': 7.82e-05, 'epoch': 0.0}
22%|β–ˆβ–ˆβ– | 109/500 [00:28<01:31, 4.29it/s] 22%|β–ˆβ–ˆβ– | 110/500 [00:28<01:31, 4.28it/s] {'loss': 0.0, 'learning_rate': 7.800000000000001e-05, 'epoch': 0.0}
22%|β–ˆβ–ˆβ– | 110/500 [00:28<01:31, 4.28it/s] 22%|β–ˆβ–ˆβ– | 111/500 [00:28<01:31, 4.27it/s] {'loss': 0.0, 'learning_rate': 7.780000000000001e-05, 'epoch': 0.0}
22%|β–ˆβ–ˆβ– | 111/500 [00:28<01:31, 4.27it/s] 22%|β–ˆβ–ˆβ– | 112/500 [00:28<01:30, 4.30it/s] {'loss': 0.0, 'learning_rate': 7.76e-05, 'epoch': 0.0}
22%|β–ˆβ–ˆβ– | 112/500 [00:28<01:30, 4.30it/s] 23%|β–ˆβ–ˆβ–Ž | 113/500 [00:29<01:29, 4.31it/s] {'loss': 0.0, 'learning_rate': 7.740000000000001e-05, 'epoch': 0.0}
23%|β–ˆβ–ˆβ–Ž | 113/500 [00:29<01:29, 4.31it/s] 23%|β–ˆβ–ˆβ–Ž | 114/500 [00:29<01:29, 4.31it/s] {'loss': 0.0, 'learning_rate': 7.72e-05, 'epoch': 0.0}
23%|β–ˆβ–ˆβ–Ž | 114/500 [00:29<01:29, 4.31it/s] 23%|β–ˆβ–ˆβ–Ž | 115/500 [00:29<01:29, 4.31it/s] {'loss': 0.0, 'learning_rate': 7.7e-05, 'epoch': 0.0}
23%|β–ˆβ–ˆβ–Ž | 115/500 [00:29<01:29, 4.31it/s] 23%|β–ˆβ–ˆβ–Ž | 116/500 [00:29<01:29, 4.31it/s] {'loss': 0.0, 'learning_rate': 7.680000000000001e-05, 'epoch': 0.0}
23%|β–ˆβ–ˆβ–Ž | 116/500 [00:29<01:29, 4.31it/s] 23%|β–ˆβ–ˆβ–Ž | 117/500 [00:30<01:28, 4.32it/s] {'loss': 0.0, 'learning_rate': 7.66e-05, 'epoch': 0.0}
23%|β–ˆβ–ˆβ–Ž | 117/500 [00:30<01:28, 4.32it/s] 24%|β–ˆβ–ˆβ–Ž | 118/500 [00:30<01:28, 4.31it/s] {'loss': 0.0, 'learning_rate': 7.64e-05, 'epoch': 0.0}
24%|β–ˆβ–ˆβ–Ž | 118/500 [00:30<01:28, 4.31it/s] 24%|β–ˆβ–ˆβ– | 119/500 [00:30<01:28, 4.32it/s] {'loss': 0.0, 'learning_rate': 7.620000000000001e-05, 'epoch': 0.0}
24%|β–ˆβ–ˆβ– | 119/500 [00:30<01:28, 4.32it/s] 24%|β–ˆβ–ˆβ– | 120/500 [00:30<01:27, 4.33it/s] {'loss': 0.0, 'learning_rate': 7.6e-05, 'epoch': 0.0}
24%|β–ˆβ–ˆβ– | 120/500 [00:30<01:27, 4.33it/s] 24%|β–ˆβ–ˆβ– | 121/500 [00:30<01:27, 4.32it/s] {'loss': 0.0, 'learning_rate': 7.58e-05, 'epoch': 0.0}
24%|β–ˆβ–ˆβ– | 121/500 [00:30<01:27, 4.32it/s] 24%|β–ˆβ–ˆβ– | 122/500 [00:31<01:27, 4.34it/s] {'loss': 0.0, 'learning_rate': 7.560000000000001e-05, 'epoch': 0.0}
24%|β–ˆβ–ˆβ– | 122/500 [00:31<01:27, 4.34it/s] 25%|β–ˆβ–ˆβ– | 123/500 [00:31<01:27, 4.29it/s] {'loss': 0.0, 'learning_rate': 7.54e-05, 'epoch': 0.0}
25%|β–ˆβ–ˆβ– | 123/500 [00:31<01:27, 4.29it/s] 25%|β–ˆβ–ˆβ– | 124/500 [00:31<01:28, 4.26it/s] {'loss': 0.0, 'learning_rate': 7.52e-05, 'epoch': 0.0}
25%|β–ˆβ–ˆβ– | 124/500 [00:31<01:28, 4.26it/s] 25%|β–ˆβ–ˆβ–Œ | 125/500 [00:31<01:28, 4.26it/s] {'loss': 0.0, 'learning_rate': 7.500000000000001e-05, 'epoch': 0.0}
25%|β–ˆβ–ˆβ–Œ | 125/500 [00:31<01:28, 4.26it/s] 25%|β–ˆβ–ˆβ–Œ | 126/500 [00:32<01:27, 4.26it/s] {'loss': 0.0, 'learning_rate': 7.48e-05, 'epoch': 0.0}
25%|β–ˆβ–ˆβ–Œ | 126/500 [00:32<01:27, 4.26it/s] 25%|β–ˆβ–ˆβ–Œ | 127/500 [00:32<01:27, 4.28it/s] {'loss': 0.0, 'learning_rate': 7.46e-05, 'epoch': 0.0}
25%|β–ˆβ–ˆβ–Œ | 127/500 [00:32<01:27, 4.28it/s] 26%|β–ˆβ–ˆβ–Œ | 128/500 [00:32<01:26, 4.30it/s] {'loss': 0.0, 'learning_rate': 7.44e-05, 'epoch': 0.0}
26%|β–ˆβ–ˆβ–Œ | 128/500 [00:32<01:26, 4.30it/s] 26%|β–ˆβ–ˆβ–Œ | 129/500 [00:32<01:26, 4.29it/s] {'loss': 0.0, 'learning_rate': 7.42e-05, 'epoch': 0.0}
26%|β–ˆβ–ˆβ–Œ | 129/500 [00:32<01:26, 4.29it/s] 26%|β–ˆβ–ˆβ–Œ | 130/500 [00:33<01:25, 4.30it/s] {'loss': 0.0, 'learning_rate': 7.4e-05, 'epoch': 0.0}
26%|β–ˆβ–ˆβ–Œ | 130/500 [00:33<01:25, 4.30it/s] 26%|β–ˆβ–ˆβ–Œ | 131/500 [00:33<01:26, 4.28it/s] {'loss': 0.0, 'learning_rate': 7.38e-05, 'epoch': 0.01}
26%|β–ˆβ–ˆβ–Œ | 131/500 [00:33<01:26, 4.28it/s] 26%|β–ˆβ–ˆβ–‹ | 132/500 [00:33<01:26, 4.27it/s] {'loss': 0.0, 'learning_rate': 7.36e-05, 'epoch': 0.01}
26%|β–ˆβ–ˆβ–‹ | 132/500 [00:33<01:26, 4.27it/s] 27%|β–ˆβ–ˆβ–‹ | 133/500 [00:33<01:26, 4.26it/s] {'loss': 0.0, 'learning_rate': 7.340000000000001e-05, 'epoch': 0.01}
27%|β–ˆβ–ˆβ–‹ | 133/500 [00:33<01:26, 4.26it/s] 27%|β–ˆβ–ˆβ–‹ | 134/500 [00:33<01:26, 4.25it/s] {'loss': 0.0, 'learning_rate': 7.32e-05, 'epoch': 0.01}
27%|β–ˆβ–ˆβ–‹ | 134/500 [00:34<01:26, 4.25it/s] 27%|β–ˆβ–ˆβ–‹ | 135/500 [00:34<01:25, 4.25it/s] {'loss': 0.0, 'learning_rate': 7.3e-05, 'epoch': 0.01}
27%|β–ˆβ–ˆβ–‹ | 135/500 [00:34<01:25, 4.25it/s] 27%|β–ˆβ–ˆβ–‹ | 136/500 [00:34<01:25, 4.24it/s] {'loss': 0.0, 'learning_rate': 7.280000000000001e-05, 'epoch': 0.01}
27%|β–ˆβ–ˆβ–‹ | 136/500 [00:34<01:25, 4.24it/s] 27%|β–ˆβ–ˆβ–‹ | 137/500 [00:34<01:25, 4.23it/s] {'loss': 0.0, 'learning_rate': 7.26e-05, 'epoch': 0.01}
27%|β–ˆβ–ˆβ–‹ | 137/500 [00:34<01:25, 4.23it/s] 28%|β–ˆβ–ˆβ–Š | 138/500 [00:34<01:25, 4.25it/s] {'loss': 0.0, 'learning_rate': 7.24e-05, 'epoch': 0.01}
28%|β–ˆβ–ˆβ–Š | 138/500 [00:34<01:25, 4.25it/s] 28%|β–ˆβ–ˆβ–Š | 139/500 [00:35<01:24, 4.25it/s] {'loss': 0.0, 'learning_rate': 7.22e-05, 'epoch': 0.01}
28%|β–ˆβ–ˆβ–Š | 139/500 [00:35<01:24, 4.25it/s] 28%|β–ˆβ–ˆβ–Š | 140/500 [00:35<01:24, 4.24it/s] {'loss': 0.0, 'learning_rate': 7.2e-05, 'epoch': 0.01}
28%|β–ˆβ–ˆβ–Š | 140/500 [00:35<01:24, 4.24it/s] 28%|β–ˆβ–ˆβ–Š | 141/500 [00:35<01:24, 4.23it/s] {'loss': 0.0, 'learning_rate': 7.18e-05, 'epoch': 0.01}
28%|β–ˆβ–ˆβ–Š | 141/500 [00:35<01:24, 4.23it/s] 28%|β–ˆβ–ˆβ–Š | 142/500 [00:35<01:24, 4.22it/s] {'loss': 0.0, 'learning_rate': 7.16e-05, 'epoch': 0.01}
28%|β–ˆβ–ˆβ–Š | 142/500 [00:35<01:24, 4.22it/s] 29%|β–ˆβ–ˆβ–Š | 143/500 [00:36<01:24, 4.24it/s] {'loss': 0.0, 'learning_rate': 7.14e-05, 'epoch': 0.01}
29%|β–ˆβ–ˆβ–Š | 143/500 [00:36<01:24, 4.24it/s] 29%|β–ˆβ–ˆβ–‰ | 144/500 [00:36<01:23, 4.26it/s] {'loss': 0.0, 'learning_rate': 7.12e-05, 'epoch': 0.01}
29%|β–ˆβ–ˆβ–‰ | 144/500 [00:36<01:23, 4.26it/s] 29%|β–ˆβ–ˆβ–‰ | 145/500 [00:36<01:23, 4.26it/s] {'loss': 0.0, 'learning_rate': 7.1e-05, 'epoch': 0.01}
29%|β–ˆβ–ˆβ–‰ | 145/500 [00:36<01:23, 4.26it/s] 29%|β–ˆβ–ˆβ–‰ | 146/500 [00:36<01:22, 4.28it/s] {'loss': 0.0, 'learning_rate': 7.08e-05, 'epoch': 0.01}
29%|β–ˆβ–ˆβ–‰ | 146/500 [00:36<01:22, 4.28it/s] 29%|β–ˆβ–ˆβ–‰ | 147/500 [00:37<01:22, 4.25it/s] {'loss': 0.0, 'learning_rate': 7.06e-05, 'epoch': 0.01}
29%|β–ˆβ–ˆβ–‰ | 147/500 [00:37<01:22, 4.25it/s] 30%|β–ˆβ–ˆβ–‰ | 148/500 [00:37<01:22, 4.28it/s] {'loss': 0.0, 'learning_rate': 7.04e-05, 'epoch': 0.01}
30%|β–ˆβ–ˆβ–‰ | 148/500 [00:37<01:22, 4.28it/s] 30%|β–ˆβ–ˆβ–‰ | 149/500 [00:37<01:21, 4.28it/s] {'loss': 0.0, 'learning_rate': 7.02e-05, 'epoch': 0.01}
30%|β–ˆβ–ˆβ–‰ | 149/500 [00:37<01:21, 4.28it/s] 30%|β–ˆβ–ˆβ–ˆ | 150/500 [00:37<01:21, 4.27it/s] {'loss': 0.0, 'learning_rate': 7e-05, 'epoch': 0.01}
30%|β–ˆβ–ˆβ–ˆ | 150/500 [00:37<01:21, 4.27it/s][INFO|tokenization_utils_base.py:2428] 2023-12-09 15:28:01,507 >> tokenizer config file saved in output/text-20231209-152643-1e-4/checkpoint-150/tokenizer_config.json
[INFO|tokenization_utils_base.py:2437] 2023-12-09 15:28:01,507 >> Special tokens file saved in output/text-20231209-152643-1e-4/checkpoint-150/special_tokens_map.json
30%|β–ˆβ–ˆβ–ˆ | 151/500 [00:38<01:26, 4.04it/s] {'loss': 0.0, 'learning_rate': 6.98e-05, 'epoch': 0.01}
30%|β–ˆβ–ˆβ–ˆ | 151/500 [00:38<01:26, 4.04it/s] 30%|β–ˆβ–ˆβ–ˆ | 152/500 [00:38<01:24, 4.13it/s] {'loss': 0.0, 'learning_rate': 6.96e-05, 'epoch': 0.01}
30%|β–ˆβ–ˆβ–ˆ | 152/500 [00:38<01:24, 4.13it/s] 31%|β–ˆβ–ˆβ–ˆ | 153/500 [00:38<01:23, 4.16it/s] {'loss': 0.0, 'learning_rate': 6.939999999999999e-05, 'epoch': 0.01}
31%|β–ˆβ–ˆβ–ˆ | 153/500 [00:38<01:23, 4.16it/s] 31%|β–ˆβ–ˆβ–ˆ | 154/500 [00:38<01:22, 4.21it/s] {'loss': 0.0, 'learning_rate': 6.92e-05, 'epoch': 0.01}
31%|β–ˆβ–ˆβ–ˆ | 154/500 [00:38<01:22, 4.21it/s] 31%|β–ˆβ–ˆβ–ˆ | 155/500 [00:38<01:21, 4.24it/s] {'loss': 0.0, 'learning_rate': 6.9e-05, 'epoch': 0.01}
31%|β–ˆβ–ˆβ–ˆ | 155/500 [00:38<01:21, 4.24it/s] 31%|β–ˆβ–ˆβ–ˆ | 156/500 [00:39<01:21, 4.24it/s] {'loss': 0.0, 'learning_rate': 6.879999999999999e-05, 'epoch': 0.01}
31%|β–ˆβ–ˆβ–ˆ | 156/500 [00:39<01:21, 4.24it/s] 31%|β–ˆβ–ˆβ–ˆβ– | 157/500 [00:39<01:20, 4.25it/s] {'loss': 0.0, 'learning_rate': 6.860000000000001e-05, 'epoch': 0.01}
31%|β–ˆβ–ˆβ–ˆβ– | 157/500 [00:39<01:20, 4.25it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 158/500 [00:39<01:20, 4.26it/s] {'loss': 0.0, 'learning_rate': 6.840000000000001e-05, 'epoch': 0.01}
32%|β–ˆβ–ˆβ–ˆβ– | 158/500 [00:39<01:20, 4.26it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 159/500 [00:39<01:19, 4.29it/s] {'loss': 0.0, 'learning_rate': 6.82e-05, 'epoch': 0.01}
32%|β–ˆβ–ˆβ–ˆβ– | 159/500 [00:39<01:19, 4.29it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 160/500 [00:40<01:19, 4.27it/s] {'loss': 0.0, 'learning_rate': 6.800000000000001e-05, 'epoch': 0.01}
32%|β–ˆβ–ˆβ–ˆβ– | 160/500 [00:40<01:19, 4.27it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 161/500 [00:40<01:19, 4.29it/s] {'loss': 0.0, 'learning_rate': 6.780000000000001e-05, 'epoch': 0.01}
32%|β–ˆβ–ˆβ–ˆβ– | 161/500 [00:40<01:19, 4.29it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 162/500 [00:40<01:18, 4.30it/s] {'loss': 0.0, 'learning_rate': 6.76e-05, 'epoch': 0.01}
32%|β–ˆβ–ˆβ–ˆβ– | 162/500 [00:40<01:18, 4.30it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 163/500 [00:40<01:18, 4.28it/s] {'loss': 0.0, 'learning_rate': 6.740000000000001e-05, 'epoch': 0.01}
33%|β–ˆβ–ˆβ–ˆβ–Ž | 163/500 [00:40<01:18, 4.28it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 164/500 [00:41<01:18, 4.29it/s] {'loss': 0.0, 'learning_rate': 6.720000000000001e-05, 'epoch': 0.01}
33%|β–ˆβ–ˆβ–ˆβ–Ž | 164/500 [00:41<01:18, 4.29it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 165/500 [00:41<01:18, 4.27it/s] {'loss': 0.0, 'learning_rate': 6.7e-05, 'epoch': 0.01}
33%|β–ˆβ–ˆβ–ˆβ–Ž | 165/500 [00:41<01:18, 4.27it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 166/500 [00:41<01:18, 4.26it/s] {'loss': 0.0, 'learning_rate': 6.680000000000001e-05, 'epoch': 0.01}
33%|β–ˆβ–ˆβ–ˆβ–Ž | 166/500 [00:41<01:18, 4.26it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 167/500 [00:41<01:18, 4.26it/s] {'loss': 0.0, 'learning_rate': 6.66e-05, 'epoch': 0.01}
33%|β–ˆβ–ˆβ–ˆβ–Ž | 167/500 [00:41<01:18, 4.26it/s] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 168/500 [00:42<01:18, 4.23it/s] {'loss': 0.0, 'learning_rate': 6.64e-05, 'epoch': 0.01}
34%|β–ˆβ–ˆβ–ˆβ–Ž | 168/500 [00:42<01:18, 4.23it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 169/500 [00:42<01:18, 4.22it/s] {'loss': 0.0, 'learning_rate': 6.620000000000001e-05, 'epoch': 0.01}
34%|β–ˆβ–ˆβ–ˆβ– | 169/500 [00:42<01:18, 4.22it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 170/500 [00:42<01:19, 4.16it/s] {'loss': 0.0, 'learning_rate': 6.6e-05, 'epoch': 0.01}
34%|β–ˆβ–ˆβ–ˆβ– | 170/500 [00:42<01:19, 4.16it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 171/500 [00:42<01:31, 3.60it/s] {'loss': 0.0, 'learning_rate': 6.58e-05, 'epoch': 0.01}
34%|β–ˆβ–ˆβ–ˆβ– | 171/500 [00:42<01:31, 3.60it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 172/500 [00:43<01:36, 3.39it/s] {'loss': 0.0, 'learning_rate': 6.560000000000001e-05, 'epoch': 0.01}
34%|β–ˆβ–ˆβ–ˆβ– | 172/500 [00:43<01:36, 3.39it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 173/500 [00:43<01:32, 3.53it/s] {'loss': 0.0, 'learning_rate': 6.54e-05, 'epoch': 0.01}
35%|β–ˆβ–ˆβ–ˆβ– | 173/500 [00:43<01:32, 3.53it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 174/500 [00:43<01:27, 3.72it/s] {'loss': 0.0, 'learning_rate': 6.52e-05, 'epoch': 0.01}
35%|β–ˆβ–ˆβ–ˆβ– | 174/500 [00:43<01:27, 3.72it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 175/500 [00:43<01:24, 3.86it/s] {'loss': 0.0, 'learning_rate': 6.500000000000001e-05, 'epoch': 0.01}
35%|β–ˆβ–ˆβ–ˆβ–Œ | 175/500 [00:43<01:24, 3.86it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 176/500 [00:44<01:21, 3.96it/s] {'loss': 0.0, 'learning_rate': 6.48e-05, 'epoch': 0.01}
35%|β–ˆβ–ˆβ–ˆβ–Œ | 176/500 [00:44<01:21, 3.96it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 177/500 [00:44<01:19, 4.05it/s] {'loss': 0.0, 'learning_rate': 6.460000000000001e-05, 'epoch': 0.01}
35%|β–ˆβ–ˆβ–ˆβ–Œ | 177/500 [00:44<01:19, 4.05it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 178/500 [00:44<01:17, 4.14it/s] {'loss': 0.0, 'learning_rate': 6.440000000000001e-05, 'epoch': 0.01}
36%|β–ˆβ–ˆβ–ˆβ–Œ | 178/500 [00:44<01:17, 4.14it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 179/500 [00:44<01:16, 4.19it/s] {'loss': 0.0, 'learning_rate': 6.42e-05, 'epoch': 0.01}
36%|β–ˆβ–ˆβ–ˆβ–Œ | 179/500 [00:44<01:16, 4.19it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 180/500 [00:45<01:16, 4.20it/s] {'loss': 0.0, 'learning_rate': 6.400000000000001e-05, 'epoch': 0.01}
36%|β–ˆβ–ˆβ–ˆβ–Œ | 180/500 [00:45<01:16, 4.20it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 181/500 [00:45<01:15, 4.23it/s] {'loss': 0.0, 'learning_rate': 6.38e-05, 'epoch': 0.01}
36%|β–ˆβ–ˆβ–ˆβ–Œ | 181/500 [00:45<01:15, 4.23it/s] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 182/500 [00:45<01:14, 4.27it/s] {'loss': 0.0, 'learning_rate': 6.36e-05, 'epoch': 0.01}
36%|β–ˆβ–ˆβ–ˆβ–‹ | 182/500 [00:45<01:14, 4.27it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 183/500 [00:45<01:14, 4.27it/s] {'loss': 0.0, 'learning_rate': 6.340000000000001e-05, 'epoch': 0.01}
37%|β–ˆβ–ˆβ–ˆβ–‹ | 183/500 [00:45<01:14, 4.27it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 184/500 [00:46<01:14, 4.27it/s] {'loss': 0.0, 'learning_rate': 6.32e-05, 'epoch': 0.01}
37%|β–ˆβ–ˆβ–ˆβ–‹ | 184/500 [00:46<01:14, 4.27it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 185/500 [00:46<01:13, 4.27it/s] {'loss': 0.0, 'learning_rate': 6.3e-05, 'epoch': 0.01}
37%|β–ˆβ–ˆβ–ˆβ–‹ | 185/500 [00:46<01:13, 4.27it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 186/500 [00:46<01:13, 4.25it/s] {'loss': 0.0, 'learning_rate': 6.280000000000001e-05, 'epoch': 0.01}
37%|β–ˆβ–ˆβ–ˆβ–‹ | 186/500 [00:46<01:13, 4.25it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 187/500 [00:46<01:13, 4.26it/s] {'loss': 0.0, 'learning_rate': 6.26e-05, 'epoch': 0.01}
37%|β–ˆβ–ˆβ–ˆβ–‹ | 187/500 [00:46<01:13, 4.26it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 188/500 [00:46<01:13, 4.27it/s] {'loss': 0.0, 'learning_rate': 6.24e-05, 'epoch': 0.01}
38%|β–ˆβ–ˆβ–ˆβ–Š | 188/500 [00:46<01:13, 4.27it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 189/500 [00:47<01:12, 4.28it/s] {'loss': 0.0, 'learning_rate': 6.220000000000001e-05, 'epoch': 0.01}
38%|β–ˆβ–ˆβ–ˆβ–Š | 189/500 [00:47<01:12, 4.28it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 190/500 [00:47<01:12, 4.29it/s] {'loss': 0.0, 'learning_rate': 6.2e-05, 'epoch': 0.01}
38%|β–ˆβ–ˆβ–ˆβ–Š | 190/500 [00:47<01:12, 4.29it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 191/500 [00:47<01:12, 4.28it/s] {'loss': 0.0, 'learning_rate': 6.18e-05, 'epoch': 0.01}
38%|β–ˆβ–ˆβ–ˆβ–Š | 191/500 [00:47<01:12, 4.28it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 192/500 [00:47<01:11, 4.29it/s] {'loss': 0.0, 'learning_rate': 6.16e-05, 'epoch': 0.01}
38%|β–ˆβ–ˆβ–ˆβ–Š | 192/500 [00:47<01:11, 4.29it/s] 39%|β–ˆβ–ˆβ–ˆβ–Š | 193/500 [00:48<01:11, 4.28it/s] {'loss': 0.0, 'learning_rate': 6.14e-05, 'epoch': 0.01}
39%|β–ˆβ–ˆβ–ˆβ–Š | 193/500 [00:48<01:11, 4.28it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 194/500 [00:48<01:11, 4.30it/s] {'loss': 0.0, 'learning_rate': 6.12e-05, 'epoch': 0.01}
39%|β–ˆβ–ˆβ–ˆβ–‰ | 194/500 [00:48<01:11, 4.30it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 195/500 [00:48<01:10, 4.31it/s] {'loss': 0.0, 'learning_rate': 6.1e-05, 'epoch': 0.01}
39%|β–ˆβ–ˆβ–ˆβ–‰ | 195/500 [00:48<01:10, 4.31it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 196/500 [00:48<01:17, 3.93it/s] {'loss': 0.0, 'learning_rate': 6.08e-05, 'epoch': 0.01}
39%|β–ˆβ–ˆβ–ˆβ–‰ | 196/500 [00:48<01:17, 3.93it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 197/500 [00:49<01:20, 3.79it/s] {'loss': 0.0, 'learning_rate': 6.06e-05, 'epoch': 0.01}
39%|β–ˆβ–ˆβ–ˆβ–‰ | 197/500 [00:49<01:20, 3.79it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 198/500 [00:49<01:43, 2.92it/s] {'loss': 0.0, 'learning_rate': 6.04e-05, 'epoch': 0.01}
40%|β–ˆβ–ˆβ–ˆβ–‰ | 198/500 [00:49<01:43, 2.92it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 199/500 [00:50<01:56, 2.58it/s] {'loss': 0.0, 'learning_rate': 6.02e-05, 'epoch': 0.01}
40%|β–ˆβ–ˆβ–ˆβ–‰ | 199/500 [00:50<01:56, 2.58it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 200/500 [00:50<01:52, 2.67it/s] {'loss': 0.0, 'learning_rate': 6e-05, 'epoch': 0.01}
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 200/500 [00:50<01:52, 2.67it/s][INFO|tokenization_utils_base.py:2428] 2023-12-09 15:28:14,303 >> tokenizer config file saved in output/text-20231209-152643-1e-4/checkpoint-200/tokenizer_config.json
[INFO|tokenization_utils_base.py:2437] 2023-12-09 15:28:14,304 >> Special tokens file saved in output/text-20231209-152643-1e-4/checkpoint-200/special_tokens_map.json
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 201/500 [00:50<01:52, 2.65it/s] {'loss': 0.0, 'learning_rate': 5.9800000000000003e-05, 'epoch': 0.01}
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 201/500 [00:50<01:52, 2.65it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 202/500 [00:51<01:46, 2.79it/s] {'loss': 0.0, 'learning_rate': 5.96e-05, 'epoch': 0.01}
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 202/500 [00:51<01:46, 2.79it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 203/500 [00:51<01:51, 2.67it/s] {'loss': 0.0, 'learning_rate': 5.94e-05, 'epoch': 0.01}
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 203/500 [00:51<01:51, 2.67it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 204/500 [00:51<01:48, 2.74it/s] {'loss': 0.0, 'learning_rate': 5.92e-05, 'epoch': 0.01}
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 204/500 [00:52<01:48, 2.74it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 205/500 [00:52<01:39, 2.97it/s] {'loss': 0.0, 'learning_rate': 5.9e-05, 'epoch': 0.01}
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 205/500 [00:52<01:39, 2.97it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 206/500 [00:52<01:42, 2.88it/s] {'loss': 0.0, 'learning_rate': 5.88e-05, 'epoch': 0.01}
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 206/500 [00:52<01:42, 2.88it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 207/500 [00:52<01:33, 3.14it/s] {'loss': 0.0, 'learning_rate': 5.86e-05, 'epoch': 0.01}
41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 207/500 [00:52<01:33, 3.14it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 208/500 [00:53<01:31, 3.19it/s] {'loss': 0.0, 'learning_rate': 5.8399999999999997e-05, 'epoch': 0.01}
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 208/500 [00:53<01:31, 3.19it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 209/500 [00:53<01:30, 3.20it/s] {'loss': 0.0, 'learning_rate': 5.82e-05, 'epoch': 0.01}
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 209/500 [00:53<01:30, 3.20it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 210/500 [00:53<01:35, 3.03it/s] {'loss': 0.0, 'learning_rate': 5.8e-05, 'epoch': 0.01}
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 210/500 [00:53<01:35, 3.03it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 211/500 [00:54<01:33, 3.09it/s] {'loss': 0.0, 'learning_rate': 5.7799999999999995e-05, 'epoch': 0.01}
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 211/500 [00:54<01:33, 3.09it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 212/500 [00:54<01:38, 2.91it/s] {'loss': 0.0, 'learning_rate': 5.76e-05, 'epoch': 0.01}
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 212/500 [00:54<01:38, 2.91it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 213/500 [00:54<01:42, 2.79it/s] {'loss': 0.0, 'learning_rate': 5.74e-05, 'epoch': 0.01}
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 213/500 [00:54<01:42, 2.79it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 214/500 [00:55<01:38, 2.91it/s] {'loss': 0.0, 'learning_rate': 5.72e-05, 'epoch': 0.01}
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 214/500 [00:55<01:38, 2.91it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 215/500 [00:55<01:37, 2.91it/s] {'loss': 0.0, 'learning_rate': 5.6999999999999996e-05, 'epoch': 0.01}
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 215/500 [00:55<01:37, 2.91it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 216/500 [00:55<01:40, 2.84it/s] {'loss': 0.0, 'learning_rate': 5.68e-05, 'epoch': 0.01}
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 216/500 [00:55<01:40, 2.84it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 217/500 [00:56<01:31, 3.11it/s] {'loss': 0.0, 'learning_rate': 5.66e-05, 'epoch': 0.01}
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 217/500 [00:56<01:31, 3.11it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 218/500 [00:56<01:30, 3.12it/s] {'loss': 0.0, 'learning_rate': 5.6399999999999995e-05, 'epoch': 0.01}
44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 218/500 [00:56<01:30, 3.12it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 219/500 [00:56<01:29, 3.14it/s] {'loss': 0.0, 'learning_rate': 5.620000000000001e-05, 'epoch': 0.01}
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 219/500 [00:56<01:29, 3.14it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 220/500 [00:57<01:31, 3.08it/s] {'loss': 0.0, 'learning_rate': 5.6000000000000006e-05, 'epoch': 0.01}
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 220/500 [00:57<01:31, 3.08it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 221/500 [00:57<01:37, 2.86it/s] {'loss': 0.0, 'learning_rate': 5.580000000000001e-05, 'epoch': 0.01}
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 221/500 [00:57<01:37, 2.86it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 222/500 [00:57<01:39, 2.79it/s] {'loss': 0.0, 'learning_rate': 5.560000000000001e-05, 'epoch': 0.01}
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 222/500 [00:58<01:39, 2.79it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 223/500 [00:58<01:38, 2.80it/s] {'loss': 0.0, 'learning_rate': 5.5400000000000005e-05, 'epoch': 0.01}
45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 223/500 [00:58<01:38, 2.80it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 224/500 [00:58<01:35, 2.89it/s] {'loss': 0.0, 'learning_rate': 5.520000000000001e-05, 'epoch': 0.01}
45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 224/500 [00:58<01:35, 2.89it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 225/500 [00:59<01:39, 2.77it/s] {'loss': 0.0, 'learning_rate': 5.500000000000001e-05, 'epoch': 0.01}
45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 225/500 [00:59<01:39, 2.77it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 226/500 [00:59<01:40, 2.72it/s] {'loss': 0.0, 'learning_rate': 5.4800000000000004e-05, 'epoch': 0.01}
45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 226/500 [00:59<01:40, 2.72it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 227/500 [00:59<01:40, 2.71it/s] {'loss': 0.0, 'learning_rate': 5.4600000000000006e-05, 'epoch': 0.01}
45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 227/500 [00:59<01:40, 2.71it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 228/500 [01:00<01:40, 2.70it/s] {'loss': 0.0, 'learning_rate': 5.440000000000001e-05, 'epoch': 0.01}
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 228/500 [01:00<01:40, 2.70it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 229/500 [01:00<01:47, 2.53it/s] {'loss': 0.0, 'learning_rate': 5.420000000000001e-05, 'epoch': 0.01}
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 229/500 [01:00<01:47, 2.53it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 230/500 [01:00<01:40, 2.69it/s] {'loss': 0.0, 'learning_rate': 5.4000000000000005e-05, 'epoch': 0.01}
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 230/500 [01:00<01:40, 2.69it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 231/500 [01:01<01:35, 2.83it/s] {'loss': 0.0, 'learning_rate': 5.380000000000001e-05, 'epoch': 0.01}
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 231/500 [01:01<01:35, 2.83it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 232/500 [01:01<01:27, 3.08it/s] {'loss': 0.0, 'learning_rate': 5.360000000000001e-05, 'epoch': 0.01}
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 232/500 [01:01<01:27, 3.08it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 233/500 [01:01<01:28, 3.02it/s] {'loss': 0.0, 'learning_rate': 5.3400000000000004e-05, 'epoch': 0.01}
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 233/500 [01:01<01:28, 3.02it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 234/500 [01:02<01:30, 2.94it/s] {'loss': 0.0, 'learning_rate': 5.3200000000000006e-05, 'epoch': 0.01}
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 234/500 [01:02<01:30, 2.94it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 235/500 [01:02<01:37, 2.71it/s] {'loss': 0.0, 'learning_rate': 5.300000000000001e-05, 'epoch': 0.01}
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 235/500 [01:02<01:37, 2.71it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 236/500 [01:03<01:45, 2.49it/s] {'loss': 0.0, 'learning_rate': 5.28e-05, 'epoch': 0.01}
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 236/500 [01:03<01:45, 2.49it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 237/500 [01:03<01:49, 2.39it/s] {'loss': 0.0, 'learning_rate': 5.2600000000000005e-05, 'epoch': 0.01}
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 237/500 [01:03<01:49, 2.39it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 238/500 [01:03<01:42, 2.55it/s] {'loss': 0.0, 'learning_rate': 5.2400000000000007e-05, 'epoch': 0.01}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 238/500 [01:03<01:42, 2.55it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 239/500 [01:04<01:36, 2.72it/s] {'loss': 0.0, 'learning_rate': 5.22e-05, 'epoch': 0.01}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 239/500 [01:04<01:36, 2.72it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 240/500 [01:04<01:36, 2.71it/s] {'loss': 0.0, 'learning_rate': 5.2000000000000004e-05, 'epoch': 0.01}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 240/500 [01:04<01:36, 2.71it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 241/500 [01:05<01:35, 2.70it/s] {'loss': 0.0, 'learning_rate': 5.1800000000000005e-05, 'epoch': 0.01}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 241/500 [01:05<01:35, 2.70it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 242/500 [01:05<01:29, 2.88it/s] {'loss': 0.0, 'learning_rate': 5.16e-05, 'epoch': 0.01}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 242/500 [01:05<01:29, 2.88it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 243/500 [01:05<01:30, 2.83it/s] {'loss': 0.0, 'learning_rate': 5.14e-05, 'epoch': 0.01}
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 243/500 [01:05<01:30, 2.83it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 244/500 [01:06<01:29, 2.87it/s] {'loss': 0.0, 'learning_rate': 5.1200000000000004e-05, 'epoch': 0.01}
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 244/500 [01:06<01:29, 2.87it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 245/500 [01:06<01:23, 3.04it/s] {'loss': 0.0, 'learning_rate': 5.1000000000000006e-05, 'epoch': 0.01}
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 245/500 [01:06<01:23, 3.04it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 246/500 [01:06<01:23, 3.04it/s] {'loss': 0.0, 'learning_rate': 5.08e-05, 'epoch': 0.01}
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 246/500 [01:06<01:23, 3.04it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 247/500 [01:06<01:21, 3.12it/s] {'loss': 0.0, 'learning_rate': 5.0600000000000003e-05, 'epoch': 0.01}
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 247/500 [01:06<01:21, 3.12it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 248/500 [01:07<01:22, 3.07it/s] {'loss': 0.0, 'learning_rate': 5.0400000000000005e-05, 'epoch': 0.01}
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 248/500 [01:07<01:22, 3.07it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 249/500 [01:07<01:21, 3.06it/s] {'loss': 0.0, 'learning_rate': 5.02e-05, 'epoch': 0.01}
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 249/500 [01:07<01:21, 3.06it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 250/500 [01:07<01:19, 3.13it/s] {'loss': 0.0, 'learning_rate': 5e-05, 'epoch': 0.01}
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 250/500 [01:07<01:19, 3.13it/s][INFO|tokenization_utils_base.py:2428] 2023-12-09 15:28:31,637 >> tokenizer config file saved in output/text-20231209-152643-1e-4/checkpoint-250/tokenizer_config.json
[INFO|tokenization_utils_base.py:2437] 2023-12-09 15:28:31,638 >> Special tokens file saved in output/text-20231209-152643-1e-4/checkpoint-250/special_tokens_map.json
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 251/500 [01:08<01:27, 2.85it/s] {'loss': 0.0, 'learning_rate': 4.9800000000000004e-05, 'epoch': 0.01}
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 251/500 [01:08<01:27, 2.85it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 252/500 [01:08<01:25, 2.90it/s] {'loss': 0.0, 'learning_rate': 4.96e-05, 'epoch': 0.01}
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 252/500 [01:08<01:25, 2.90it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 253/500 [01:08<01:22, 2.99it/s] {'loss': 0.0, 'learning_rate': 4.94e-05, 'epoch': 0.01}
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 253/500 [01:08<01:22, 2.99it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 254/500 [01:09<01:25, 2.89it/s] {'loss': 0.0, 'learning_rate': 4.92e-05, 'epoch': 0.01}
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 254/500 [01:09<01:25, 2.89it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 255/500 [01:09<01:24, 2.88it/s] {'loss': 0.0, 'learning_rate': 4.9e-05, 'epoch': 0.01}
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 255/500 [01:09<01:24, 2.88it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 256/500 [01:10<01:26, 2.84it/s] {'loss': 0.0, 'learning_rate': 4.88e-05, 'epoch': 0.01}
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 256/500 [01:10<01:26, 2.84it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 257/500 [01:10<01:21, 2.97it/s] {'loss': 0.0, 'learning_rate': 4.86e-05, 'epoch': 0.01}
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 257/500 [01:10<01:21, 2.97it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 258/500 [01:10<01:22, 2.93it/s] {'loss': 0.0, 'learning_rate': 4.8400000000000004e-05, 'epoch': 0.01}
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 258/500 [01:10<01:22, 2.93it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 259/500 [01:10<01:14, 3.24it/s] {'loss': 0.0, 'learning_rate': 4.82e-05, 'epoch': 0.01}
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 259/500 [01:10<01:14, 3.24it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 260/500 [01:11<01:08, 3.49it/s] {'loss': 0.0, 'learning_rate': 4.8e-05, 'epoch': 0.01}
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 260/500 [01:11<01:08, 3.49it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 261/500 [01:11<01:06, 3.58it/s] {'loss': 0.0, 'learning_rate': 4.78e-05, 'epoch': 0.01}
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 261/500 [01:11<01:06, 3.58it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 262/500 [01:11<01:13, 3.22it/s] {'loss': 0.0, 'learning_rate': 4.76e-05, 'epoch': 0.01}
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 262/500 [01:11<01:13, 3.22it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 263/500 [01:12<01:09, 3.40it/s] {'loss': 0.0, 'learning_rate': 4.74e-05, 'epoch': 0.01}
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 263/500 [01:12<01:09, 3.40it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 264/500 [01:12<01:15, 3.14it/s] {'loss': 0.0, 'learning_rate': 4.72e-05, 'epoch': 0.01}
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 264/500 [01:12<01:15, 3.14it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 265/500 [01:12<01:18, 3.00it/s] {'loss': 0.0, 'learning_rate': 4.7e-05, 'epoch': 0.01}
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 265/500 [01:12<01:18, 3.00it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 266/500 [01:13<01:19, 2.94it/s] {'loss': 0.0, 'learning_rate': 4.6800000000000006e-05, 'epoch': 0.01}
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 266/500 [01:13<01:19, 2.94it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 267/500 [01:13<01:20, 2.91it/s] {'loss': 0.0, 'learning_rate': 4.660000000000001e-05, 'epoch': 0.01}
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 267/500 [01:13<01:20, 2.91it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 268/500 [01:13<01:23, 2.77it/s] {'loss': 0.0, 'learning_rate': 4.64e-05, 'epoch': 0.01}
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 268/500 [01:13<01:23, 2.77it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 269/500 [01:14<01:19, 2.91it/s] {'loss': 0.0, 'learning_rate': 4.6200000000000005e-05, 'epoch': 0.01}
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 269/500 [01:14<01:19, 2.91it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 270/500 [01:14<01:13, 3.15it/s] {'loss': 0.0, 'learning_rate': 4.600000000000001e-05, 'epoch': 0.01}
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 270/500 [01:14<01:13, 3.15it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 271/500 [01:14<01:15, 3.04it/s] {'loss': 0.0, 'learning_rate': 4.58e-05, 'epoch': 0.01}
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 271/500 [01:14<01:15, 3.04it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 272/500 [01:15<01:13, 3.10it/s] {'loss': 0.0, 'learning_rate': 4.5600000000000004e-05, 'epoch': 0.01}
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 272/500 [01:15<01:13, 3.10it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 273/500 [01:15<01:12, 3.13it/s] {'loss': 0.0, 'learning_rate': 4.5400000000000006e-05, 'epoch': 0.01}
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 273/500 [01:15<01:12, 3.13it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 274/500 [01:15<01:18, 2.89it/s] {'loss': 0.0, 'learning_rate': 4.52e-05, 'epoch': 0.01}
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 274/500 [01:15<01:18, 2.89it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 275/500 [01:16<01:11, 3.13it/s] {'loss': 0.0, 'learning_rate': 4.5e-05, 'epoch': 0.01}
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 275/500 [01:16<01:11, 3.13it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 276/500 [01:16<01:15, 2.99it/s] {'loss': 0.0, 'learning_rate': 4.4800000000000005e-05, 'epoch': 0.01}
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 276/500 [01:16<01:15, 2.99it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 277/500 [01:16<01:15, 2.97it/s] {'loss': 0.0, 'learning_rate': 4.46e-05, 'epoch': 0.01}
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 277/500 [01:16<01:15, 2.97it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 278/500 [01:17<01:08, 3.26it/s] {'loss': 0.0, 'learning_rate': 4.44e-05, 'epoch': 0.01}
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 278/500 [01:17<01:08, 3.26it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 279/500 [01:17<01:02, 3.52it/s] {'loss': 0.0, 'learning_rate': 4.4200000000000004e-05, 'epoch': 0.01}
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 279/500 [01:17<01:02, 3.52it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 280/500 [01:17<00:59, 3.69it/s] {'loss': 0.0, 'learning_rate': 4.4000000000000006e-05, 'epoch': 0.01}
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 280/500 [01:17<00:59, 3.69it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 281/500 [01:17<00:56, 3.85it/s] {'loss': 0.0, 'learning_rate': 4.38e-05, 'epoch': 0.01}
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 281/500 [01:17<00:56, 3.85it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 282/500 [01:17<00:55, 3.96it/s] {'loss': 0.0, 'learning_rate': 4.36e-05, 'epoch': 0.01}
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 282/500 [01:17<00:55, 3.96it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 283/500 [01:18<00:53, 4.05it/s] {'loss': 0.0, 'learning_rate': 4.3400000000000005e-05, 'epoch': 0.01}
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 283/500 [01:18<00:53, 4.05it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 284/500 [01:18<00:52, 4.12it/s] {'loss': 0.0, 'learning_rate': 4.32e-05, 'epoch': 0.01}
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 284/500 [01:18<00:52, 4.12it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 285/500 [01:18<00:51, 4.17it/s] {'loss': 0.0, 'learning_rate': 4.3e-05, 'epoch': 0.01}
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 285/500 [01:18<00:51, 4.17it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 286/500 [01:18<00:50, 4.21it/s] {'loss': 0.0, 'learning_rate': 4.2800000000000004e-05, 'epoch': 0.01}
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 286/500 [01:18<00:50, 4.21it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 287/500 [01:19<00:50, 4.23it/s] {'loss': 0.0, 'learning_rate': 4.26e-05, 'epoch': 0.01}
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 287/500 [01:19<00:50, 4.23it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 288/500 [01:19<00:50, 4.23it/s] {'loss': 0.0, 'learning_rate': 4.24e-05, 'epoch': 0.01}
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 288/500 [01:19<00:50, 4.23it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 289/500 [01:19<00:49, 4.25it/s] {'loss': 0.0, 'learning_rate': 4.22e-05, 'epoch': 0.01}
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 289/500 [01:19<00:49, 4.25it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 290/500 [01:19<00:49, 4.26it/s] {'loss': 0.0, 'learning_rate': 4.2e-05, 'epoch': 0.01}
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 290/500 [01:19<00:49, 4.26it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 291/500 [01:20<00:48, 4.29it/s] {'loss': 0.0, 'learning_rate': 4.18e-05, 'epoch': 0.01}
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 291/500 [01:20<00:48, 4.29it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 292/500 [01:20<00:48, 4.27it/s] {'loss': 0.0, 'learning_rate': 4.16e-05, 'epoch': 0.01}
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 292/500 [01:20<00:48, 4.27it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 293/500 [01:20<00:48, 4.28it/s] {'loss': 0.0, 'learning_rate': 4.14e-05, 'epoch': 0.01}
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 293/500 [01:20<00:48, 4.28it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 294/500 [01:20<00:48, 4.27it/s] {'loss': 0.0, 'learning_rate': 4.12e-05, 'epoch': 0.01}
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 294/500 [01:20<00:48, 4.27it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 295/500 [01:21<00:47, 4.27it/s] {'loss': 0.0, 'learning_rate': 4.1e-05, 'epoch': 0.01}
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 295/500 [01:21<00:47, 4.27it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 296/500 [01:21<00:47, 4.29it/s] {'loss': 0.0, 'learning_rate': 4.08e-05, 'epoch': 0.01}
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 296/500 [01:21<00:47, 4.29it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 297/500 [01:21<00:47, 4.29it/s] {'loss': 0.0, 'learning_rate': 4.0600000000000004e-05, 'epoch': 0.01}
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 297/500 [01:21<00:47, 4.29it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 298/500 [01:21<00:47, 4.28it/s] {'loss': 0.0, 'learning_rate': 4.0400000000000006e-05, 'epoch': 0.01}
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 298/500 [01:21<00:47, 4.28it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 299/500 [01:21<00:47, 4.27it/s] {'loss': 0.0, 'learning_rate': 4.02e-05, 'epoch': 0.01}
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 299/500 [01:21<00:47, 4.27it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 300/500 [01:22<00:46, 4.27it/s] {'loss': 0.0, 'learning_rate': 4e-05, 'epoch': 0.01}
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 300/500 [01:22<00:46, 4.27it/s][INFO|tokenization_utils_base.py:2428] 2023-12-09 15:28:45,964 >> tokenizer config file saved in output/text-20231209-152643-1e-4/checkpoint-300/tokenizer_config.json
[INFO|tokenization_utils_base.py:2437] 2023-12-09 15:28:45,964 >> Special tokens file saved in output/text-20231209-152643-1e-4/checkpoint-300/special_tokens_map.json
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 301/500 [01:22<00:50, 3.94it/s] {'loss': 0.0, 'learning_rate': 3.9800000000000005e-05, 'epoch': 0.01}
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 301/500 [01:22<00:50, 3.94it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 302/500 [01:22<00:48, 4.04it/s] {'loss': 0.0, 'learning_rate': 3.960000000000001e-05, 'epoch': 0.01}
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 302/500 [01:22<00:48, 4.04it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 303/500 [01:22<00:47, 4.11it/s] {'loss': 0.0, 'learning_rate': 3.94e-05, 'epoch': 0.01}
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 303/500 [01:22<00:47, 4.11it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 304/500 [01:23<00:47, 4.17it/s] {'loss': 0.0, 'learning_rate': 3.9200000000000004e-05, 'epoch': 0.01}
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 304/500 [01:23<00:47, 4.17it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 305/500 [01:23<00:46, 4.21it/s] {'loss': 0.0, 'learning_rate': 3.9000000000000006e-05, 'epoch': 0.01}
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 305/500 [01:23<00:46, 4.21it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 306/500 [01:23<00:45, 4.23it/s] {'loss': 0.0, 'learning_rate': 3.88e-05, 'epoch': 0.01}
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 306/500 [01:23<00:45, 4.23it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 307/500 [01:23<00:45, 4.25it/s] {'loss': 0.0, 'learning_rate': 3.86e-05, 'epoch': 0.01}
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 307/500 [01:23<00:45, 4.25it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 308/500 [01:24<00:45, 4.26it/s] {'loss': 0.0, 'learning_rate': 3.8400000000000005e-05, 'epoch': 0.01}
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 308/500 [01:24<00:45, 4.26it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 309/500 [01:24<00:45, 4.24it/s] {'loss': 0.0, 'learning_rate': 3.82e-05, 'epoch': 0.01}
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 309/500 [01:24<00:45, 4.24it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 310/500 [01:24<00:44, 4.25it/s] {'loss': 0.0, 'learning_rate': 3.8e-05, 'epoch': 0.01}
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 310/500 [01:24<00:44, 4.25it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 311/500 [01:24<00:44, 4.25it/s] {'loss': 0.0, 'learning_rate': 3.7800000000000004e-05, 'epoch': 0.01}
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 311/500 [01:24<00:44, 4.25it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 312/500 [01:25<00:44, 4.24it/s] {'loss': 0.0, 'learning_rate': 3.76e-05, 'epoch': 0.01}
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 312/500 [01:25<00:44, 4.24it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 313/500 [01:25<00:44, 4.24it/s] {'loss': 0.0, 'learning_rate': 3.74e-05, 'epoch': 0.01}
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 313/500 [01:25<00:44, 4.24it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 314/500 [01:25<00:43, 4.26it/s] {'loss': 0.0, 'learning_rate': 3.72e-05, 'epoch': 0.01}
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 314/500 [01:25<00:43, 4.26it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 315/500 [01:25<00:43, 4.25it/s] {'loss': 0.0, 'learning_rate': 3.7e-05, 'epoch': 0.01}
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 315/500 [01:25<00:43, 4.25it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 316/500 [01:26<00:43, 4.24it/s] {'loss': 0.0, 'learning_rate': 3.68e-05, 'epoch': 0.01}
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 316/500 [01:26<00:43, 4.24it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 317/500 [01:26<00:43, 4.25it/s] {'loss': 0.0, 'learning_rate': 3.66e-05, 'epoch': 0.01}
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 317/500 [01:26<00:43, 4.25it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 318/500 [01:26<00:43, 4.23it/s] {'loss': 0.0, 'learning_rate': 3.6400000000000004e-05, 'epoch': 0.01}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 318/500 [01:26<00:43, 4.23it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 319/500 [01:26<00:42, 4.22it/s] {'loss': 0.0, 'learning_rate': 3.62e-05, 'epoch': 0.01}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 319/500 [01:26<00:42, 4.22it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 320/500 [01:26<00:42, 4.24it/s] {'loss': 0.0, 'learning_rate': 3.6e-05, 'epoch': 0.01}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 320/500 [01:26<00:42, 4.24it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 321/500 [01:27<00:42, 4.26it/s] {'loss': 0.0, 'learning_rate': 3.58e-05, 'epoch': 0.01}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 321/500 [01:27<00:42, 4.26it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 322/500 [01:27<00:41, 4.26it/s] {'loss': 0.0, 'learning_rate': 3.56e-05, 'epoch': 0.01}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 322/500 [01:27<00:41, 4.26it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 323/500 [01:27<00:41, 4.26it/s] {'loss': 0.0, 'learning_rate': 3.54e-05, 'epoch': 0.01}
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 323/500 [01:27<00:41, 4.26it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 324/500 [01:27<00:41, 4.26it/s] {'loss': 0.0, 'learning_rate': 3.52e-05, 'epoch': 0.01}
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 324/500 [01:27<00:41, 4.26it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 325/500 [01:28<00:41, 4.26it/s] {'loss': 0.0, 'learning_rate': 3.5e-05, 'epoch': 0.01}
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 325/500 [01:28<00:41, 4.26it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 326/500 [01:28<00:40, 4.28it/s] {'loss': 0.0, 'learning_rate': 3.48e-05, 'epoch': 0.01}
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 326/500 [01:28<00:40, 4.28it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 327/500 [01:28<00:40, 4.29it/s] {'loss': 0.0, 'learning_rate': 3.46e-05, 'epoch': 0.01}
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 327/500 [01:28<00:40, 4.29it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 328/500 [01:28<00:40, 4.29it/s] {'loss': 0.0, 'learning_rate': 3.4399999999999996e-05, 'epoch': 0.01}
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 328/500 [01:28<00:40, 4.29it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 329/500 [01:29<00:39, 4.29it/s] {'loss': 0.0, 'learning_rate': 3.4200000000000005e-05, 'epoch': 0.01}
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 329/500 [01:29<00:39, 4.29it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 330/500 [01:29<00:39, 4.29it/s] {'loss': 0.0, 'learning_rate': 3.4000000000000007e-05, 'epoch': 0.01}
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 330/500 [01:29<00:39, 4.29it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 331/500 [01:29<00:39, 4.28it/s] {'loss': 0.0, 'learning_rate': 3.38e-05, 'epoch': 0.01}
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 331/500 [01:29<00:39, 4.28it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 332/500 [01:29<00:39, 4.29it/s] {'loss': 0.0, 'learning_rate': 3.3600000000000004e-05, 'epoch': 0.01}
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 332/500 [01:29<00:39, 4.29it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 333/500 [01:30<00:39, 4.27it/s] {'loss': 0.0, 'learning_rate': 3.3400000000000005e-05, 'epoch': 0.01}
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 333/500 [01:30<00:39, 4.27it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 334/500 [01:30<00:39, 4.25it/s] {'loss': 0.0, 'learning_rate': 3.32e-05, 'epoch': 0.01}
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 334/500 [01:30<00:39, 4.25it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 335/500 [01:30<00:38, 4.24it/s] {'loss': 0.0, 'learning_rate': 3.3e-05, 'epoch': 0.01}
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 335/500 [01:30<00:38, 4.24it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 336/500 [01:30<00:38, 4.23it/s] {'loss': 0.0, 'learning_rate': 3.2800000000000004e-05, 'epoch': 0.01}
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 336/500 [01:30<00:38, 4.23it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 337/500 [01:30<00:38, 4.25it/s] {'loss': 0.0, 'learning_rate': 3.26e-05, 'epoch': 0.01}
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 337/500 [01:30<00:38, 4.25it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 338/500 [01:31<00:38, 4.23it/s] {'loss': 0.0, 'learning_rate': 3.24e-05, 'epoch': 0.01}
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 338/500 [01:31<00:38, 4.23it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 339/500 [01:31<00:38, 4.21it/s] {'loss': 0.0, 'learning_rate': 3.2200000000000003e-05, 'epoch': 0.01}
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 339/500 [01:31<00:38, 4.21it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 340/500 [01:31<00:37, 4.23it/s] {'loss': 0.0, 'learning_rate': 3.2000000000000005e-05, 'epoch': 0.01}
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 340/500 [01:31<00:37, 4.23it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 341/500 [01:31<00:37, 4.21it/s] {'loss': 0.0, 'learning_rate': 3.18e-05, 'epoch': 0.01}
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 341/500 [01:31<00:37, 4.21it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 342/500 [01:32<00:37, 4.21it/s] {'loss': 0.0, 'learning_rate': 3.16e-05, 'epoch': 0.01}
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 342/500 [01:32<00:37, 4.21it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 343/500 [01:32<00:37, 4.22it/s] {'loss': 0.0, 'learning_rate': 3.1400000000000004e-05, 'epoch': 0.01}
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 343/500 [01:32<00:37, 4.22it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 344/500 [01:32<00:36, 4.23it/s] {'loss': 0.0, 'learning_rate': 3.12e-05, 'epoch': 0.01}
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 344/500 [01:32<00:36, 4.23it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 345/500 [01:32<00:36, 4.26it/s] {'loss': 0.0, 'learning_rate': 3.1e-05, 'epoch': 0.01}
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 345/500 [01:32<00:36, 4.26it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 346/500 [01:33<00:36, 4.28it/s] {'loss': 0.0, 'learning_rate': 3.08e-05, 'epoch': 0.01}
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 346/500 [01:33<00:36, 4.28it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 347/500 [01:33<00:35, 4.28it/s] {'loss': 0.0, 'learning_rate': 3.06e-05, 'epoch': 0.01}
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 347/500 [01:33<00:35, 4.28it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 348/500 [01:33<00:35, 4.27it/s] {'loss': 0.0, 'learning_rate': 3.04e-05, 'epoch': 0.01}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 348/500 [01:33<00:35, 4.27it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 349/500 [01:33<00:35, 4.27it/s] {'loss': 0.0, 'learning_rate': 3.02e-05, 'epoch': 0.01}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 349/500 [01:33<00:35, 4.27it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 350/500 [01:34<00:35, 4.28it/s] {'loss': 0.0, 'learning_rate': 3e-05, 'epoch': 0.01}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 350/500 [01:34<00:35, 4.28it/s][INFO|tokenization_utils_base.py:2428] 2023-12-09 15:28:57,758 >> tokenizer config file saved in output/text-20231209-152643-1e-4/checkpoint-350/tokenizer_config.json
[INFO|tokenization_utils_base.py:2437] 2023-12-09 15:28:57,758 >> Special tokens file saved in output/text-20231209-152643-1e-4/checkpoint-350/special_tokens_map.json
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 351/500 [01:34<00:37, 3.96it/s] {'loss': 0.0, 'learning_rate': 2.98e-05, 'epoch': 0.01}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 351/500 [01:34<00:37, 3.96it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 352/500 [01:34<00:36, 4.05it/s] {'loss': 0.0, 'learning_rate': 2.96e-05, 'epoch': 0.01}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 352/500 [01:34<00:36, 4.05it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 353/500 [01:34<00:35, 4.12it/s] {'loss': 0.0, 'learning_rate': 2.94e-05, 'epoch': 0.01}
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 353/500 [01:34<00:35, 4.12it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 354/500 [01:35<00:35, 4.13it/s] {'loss': 0.0, 'learning_rate': 2.9199999999999998e-05, 'epoch': 0.01}
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 354/500 [01:35<00:35, 4.13it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 355/500 [01:35<00:34, 4.17it/s] {'loss': 0.0, 'learning_rate': 2.9e-05, 'epoch': 0.01}
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 355/500 [01:35<00:34, 4.17it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 356/500 [01:35<00:34, 4.21it/s] {'loss': 0.0, 'learning_rate': 2.88e-05, 'epoch': 0.01}
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 356/500 [01:35<00:34, 4.21it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 357/500 [01:35<00:33, 4.21it/s] {'loss': 0.0, 'learning_rate': 2.86e-05, 'epoch': 0.01}
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 357/500 [01:35<00:33, 4.21it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 358/500 [01:35<00:33, 4.22it/s] {'loss': 0.0, 'learning_rate': 2.84e-05, 'epoch': 0.01}
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 358/500 [01:35<00:33, 4.22it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 359/500 [01:36<00:33, 4.24it/s] {'loss': 0.0, 'learning_rate': 2.8199999999999998e-05, 'epoch': 0.01}
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 359/500 [01:36<00:33, 4.24it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 360/500 [01:36<00:33, 4.23it/s] {'loss': 0.0, 'learning_rate': 2.8000000000000003e-05, 'epoch': 0.01}
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 360/500 [01:36<00:33, 4.23it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 361/500 [01:36<00:32, 4.24it/s] {'loss': 0.0, 'learning_rate': 2.7800000000000005e-05, 'epoch': 0.01}
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 361/500 [01:36<00:32, 4.24it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 362/500 [01:36<00:32, 4.25it/s] {'loss': 0.0, 'learning_rate': 2.7600000000000003e-05, 'epoch': 0.01}
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 362/500 [01:36<00:32, 4.25it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 363/500 [01:37<00:32, 4.24it/s] {'loss': 0.0, 'learning_rate': 2.7400000000000002e-05, 'epoch': 0.01}
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 363/500 [01:37<00:32, 4.24it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 364/500 [01:37<00:32, 4.24it/s] {'loss': 0.0, 'learning_rate': 2.7200000000000004e-05, 'epoch': 0.01}
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 364/500 [01:37<00:32, 4.24it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 365/500 [01:37<00:31, 4.27it/s] {'loss': 0.0, 'learning_rate': 2.7000000000000002e-05, 'epoch': 0.01}
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 365/500 [01:37<00:31, 4.27it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 366/500 [01:37<00:31, 4.28it/s] {'loss': 0.0, 'learning_rate': 2.6800000000000004e-05, 'epoch': 0.01}
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 366/500 [01:37<00:31, 4.28it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 367/500 [01:38<00:31, 4.27it/s] {'loss': 0.0, 'learning_rate': 2.6600000000000003e-05, 'epoch': 0.01}
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 367/500 [01:38<00:31, 4.27it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 368/500 [01:38<00:30, 4.28it/s] {'loss': 0.0, 'learning_rate': 2.64e-05, 'epoch': 0.01}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 368/500 [01:38<00:30, 4.28it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 369/500 [01:38<00:30, 4.27it/s] {'loss': 0.0, 'learning_rate': 2.6200000000000003e-05, 'epoch': 0.01}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 369/500 [01:38<00:30, 4.27it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 370/500 [01:38<00:30, 4.26it/s] {'loss': 0.0, 'learning_rate': 2.6000000000000002e-05, 'epoch': 0.01}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 370/500 [01:38<00:30, 4.26it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 371/500 [01:38<00:30, 4.26it/s] {'loss': 0.0, 'learning_rate': 2.58e-05, 'epoch': 0.01}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 371/500 [01:39<00:30, 4.26it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 372/500 [01:39<00:29, 4.27it/s] {'loss': 0.0, 'learning_rate': 2.5600000000000002e-05, 'epoch': 0.01}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 372/500 [01:39<00:29, 4.27it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 373/500 [01:39<00:29, 4.26it/s] {'loss': 0.0, 'learning_rate': 2.54e-05, 'epoch': 0.01}
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 373/500 [01:39<00:29, 4.26it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 374/500 [01:39<00:29, 4.26it/s] {'loss': 0.0, 'learning_rate': 2.5200000000000003e-05, 'epoch': 0.01}
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 374/500 [01:39<00:29, 4.26it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 375/500 [01:39<00:29, 4.27it/s] {'loss': 0.0, 'learning_rate': 2.5e-05, 'epoch': 0.01}
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 375/500 [01:39<00:29, 4.27it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 376/500 [01:40<00:28, 4.28it/s] {'loss': 0.0, 'learning_rate': 2.48e-05, 'epoch': 0.01}
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 376/500 [01:40<00:28, 4.28it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 377/500 [01:40<00:28, 4.26it/s] {'loss': 0.0, 'learning_rate': 2.46e-05, 'epoch': 0.01}
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 377/500 [01:40<00:28, 4.26it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 378/500 [01:40<00:28, 4.26it/s] {'loss': 0.0, 'learning_rate': 2.44e-05, 'epoch': 0.01}
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 378/500 [01:40<00:28, 4.26it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 379/500 [01:40<00:28, 4.23it/s] {'loss': 0.0, 'learning_rate': 2.4200000000000002e-05, 'epoch': 0.01}
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 379/500 [01:40<00:28, 4.23it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 380/500 [01:41<00:28, 4.22it/s] {'loss': 0.0, 'learning_rate': 2.4e-05, 'epoch': 0.01}
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 380/500 [01:41<00:28, 4.22it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 381/500 [01:41<00:28, 4.22it/s] {'loss': 0.0, 'learning_rate': 2.38e-05, 'epoch': 0.01}
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 381/500 [01:41<00:28, 4.22it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 382/500 [01:41<00:27, 4.22it/s] {'loss': 0.0, 'learning_rate': 2.36e-05, 'epoch': 0.01}
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 382/500 [01:41<00:27, 4.22it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 383/500 [01:41<00:27, 4.25it/s] {'loss': 0.0, 'learning_rate': 2.3400000000000003e-05, 'epoch': 0.01}
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 383/500 [01:41<00:27, 4.25it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 384/500 [01:42<00:27, 4.25it/s] {'loss': 0.0, 'learning_rate': 2.32e-05, 'epoch': 0.01}
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 384/500 [01:42<00:27, 4.25it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 385/500 [01:42<00:26, 4.27it/s] {'loss': 0.0, 'learning_rate': 2.3000000000000003e-05, 'epoch': 0.01}
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 385/500 [01:42<00:26, 4.27it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 386/500 [01:42<00:26, 4.28it/s] {'loss': 0.0, 'learning_rate': 2.2800000000000002e-05, 'epoch': 0.01}
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 386/500 [01:42<00:26, 4.28it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 387/500 [01:42<00:26, 4.30it/s] {'loss': 0.0, 'learning_rate': 2.26e-05, 'epoch': 0.01}
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 387/500 [01:42<00:26, 4.30it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 388/500 [01:42<00:26, 4.27it/s] {'loss': 0.0, 'learning_rate': 2.2400000000000002e-05, 'epoch': 0.01}
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 388/500 [01:42<00:26, 4.27it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 389/500 [01:43<00:25, 4.29it/s] {'loss': 0.0, 'learning_rate': 2.22e-05, 'epoch': 0.01}
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 389/500 [01:43<00:25, 4.29it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 390/500 [01:43<00:25, 4.28it/s] {'loss': 0.0, 'learning_rate': 2.2000000000000003e-05, 'epoch': 0.01}
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 390/500 [01:43<00:25, 4.28it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 391/500 [01:43<00:25, 4.29it/s] {'loss': 0.0, 'learning_rate': 2.18e-05, 'epoch': 0.02}
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 391/500 [01:43<00:25, 4.29it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 392/500 [01:43<00:25, 4.29it/s] {'loss': 0.0, 'learning_rate': 2.16e-05, 'epoch': 0.02}
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 392/500 [01:43<00:25, 4.29it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 393/500 [01:44<00:24, 4.29it/s] {'loss': 0.0, 'learning_rate': 2.1400000000000002e-05, 'epoch': 0.02}
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 393/500 [01:44<00:24, 4.29it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 394/500 [01:44<00:24, 4.28it/s] {'loss': 0.0, 'learning_rate': 2.12e-05, 'epoch': 0.02}
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 394/500 [01:44<00:24, 4.28it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 395/500 [01:44<00:24, 4.28it/s] {'loss': 0.0, 'learning_rate': 2.1e-05, 'epoch': 0.02}
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 395/500 [01:44<00:24, 4.28it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 396/500 [01:44<00:24, 4.29it/s] {'loss': 0.0, 'learning_rate': 2.08e-05, 'epoch': 0.02}
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 396/500 [01:44<00:24, 4.29it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 397/500 [01:45<00:23, 4.30it/s] {'loss': 0.0, 'learning_rate': 2.06e-05, 'epoch': 0.02}
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 397/500 [01:45<00:23, 4.30it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 398/500 [01:45<00:23, 4.26it/s] {'loss': 0.0, 'learning_rate': 2.04e-05, 'epoch': 0.02}
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 398/500 [01:45<00:23, 4.26it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 399/500 [01:45<00:23, 4.27it/s] {'loss': 0.0, 'learning_rate': 2.0200000000000003e-05, 'epoch': 0.02}
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 399/500 [01:45<00:23, 4.27it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 400/500 [01:45<00:23, 4.28it/s] {'loss': 0.0, 'learning_rate': 2e-05, 'epoch': 0.02}
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 400/500 [01:45<00:23, 4.28it/s][INFO|tokenization_utils_base.py:2428] 2023-12-09 15:29:09,545 >> tokenizer config file saved in output/text-20231209-152643-1e-4/checkpoint-400/tokenizer_config.json
[INFO|tokenization_utils_base.py:2437] 2023-12-09 15:29:09,545 >> Special tokens file saved in output/text-20231209-152643-1e-4/checkpoint-400/special_tokens_map.json
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 401/500 [01:46<00:25, 3.95it/s] {'loss': 0.0, 'learning_rate': 1.9800000000000004e-05, 'epoch': 0.02}
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 401/500 [01:46<00:25, 3.95it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 402/500 [01:46<00:24, 4.04it/s] {'loss': 0.0, 'learning_rate': 1.9600000000000002e-05, 'epoch': 0.02}
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 402/500 [01:46<00:24, 4.04it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 403/500 [01:46<00:23, 4.11it/s] {'loss': 0.0, 'learning_rate': 1.94e-05, 'epoch': 0.02}
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 403/500 [01:46<00:23, 4.11it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 404/500 [01:46<00:23, 4.17it/s] {'loss': 0.0, 'learning_rate': 1.9200000000000003e-05, 'epoch': 0.02}
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 404/500 [01:46<00:23, 4.17it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 405/500 [01:47<00:22, 4.20it/s] {'loss': 0.0, 'learning_rate': 1.9e-05, 'epoch': 0.02}
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 405/500 [01:47<00:22, 4.20it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 406/500 [01:47<00:22, 4.23it/s] {'loss': 0.0, 'learning_rate': 1.88e-05, 'epoch': 0.02}
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 406/500 [01:47<00:22, 4.23it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 407/500 [01:47<00:21, 4.25it/s] {'loss': 0.0, 'learning_rate': 1.86e-05, 'epoch': 0.02}
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 407/500 [01:47<00:21, 4.25it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 408/500 [01:47<00:21, 4.25it/s] {'loss': 0.0, 'learning_rate': 1.84e-05, 'epoch': 0.02}
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 408/500 [01:47<00:21, 4.25it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 409/500 [01:47<00:21, 4.25it/s] {'loss': 0.0, 'learning_rate': 1.8200000000000002e-05, 'epoch': 0.02}
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 409/500 [01:47<00:21, 4.25it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 410/500 [01:48<00:21, 4.24it/s] {'loss': 0.0, 'learning_rate': 1.8e-05, 'epoch': 0.02}
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 410/500 [01:48<00:21, 4.24it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 411/500 [01:48<00:20, 4.25it/s] {'loss': 0.0, 'learning_rate': 1.78e-05, 'epoch': 0.02}
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 411/500 [01:48<00:20, 4.25it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 412/500 [01:48<00:21, 4.04it/s] {'loss': 0.0, 'learning_rate': 1.76e-05, 'epoch': 0.02}
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 412/500 [01:48<00:21, 4.04it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 413/500 [01:49<00:23, 3.67it/s] {'loss': 0.0, 'learning_rate': 1.74e-05, 'epoch': 0.02}
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 413/500 [01:49<00:23, 3.67it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 414/500 [01:49<00:24, 3.58it/s] {'loss': 0.0, 'learning_rate': 1.7199999999999998e-05, 'epoch': 0.02}
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 414/500 [01:49<00:24, 3.58it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 415/500 [01:49<00:25, 3.35it/s] {'loss': 0.0, 'learning_rate': 1.7000000000000003e-05, 'epoch': 0.02}
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 415/500 [01:49<00:25, 3.35it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 416/500 [01:49<00:23, 3.59it/s] {'loss': 0.0, 'learning_rate': 1.6800000000000002e-05, 'epoch': 0.02}
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 416/500 [01:49<00:23, 3.59it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 417/500 [01:50<00:22, 3.77it/s] {'loss': 0.0, 'learning_rate': 1.66e-05, 'epoch': 0.02}
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 417/500 [01:50<00:22, 3.77it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 418/500 [01:50<00:21, 3.90it/s] {'loss': 0.0, 'learning_rate': 1.6400000000000002e-05, 'epoch': 0.02}
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 418/500 [01:50<00:21, 3.90it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 419/500 [01:50<00:20, 4.00it/s] {'loss': 0.0, 'learning_rate': 1.62e-05, 'epoch': 0.02}
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 419/500 [01:50<00:20, 4.00it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 420/500 [01:50<00:19, 4.08it/s] {'loss': 0.0, 'learning_rate': 1.6000000000000003e-05, 'epoch': 0.02}
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 420/500 [01:50<00:19, 4.08it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 421/500 [01:51<00:19, 4.15it/s] {'loss': 0.0, 'learning_rate': 1.58e-05, 'epoch': 0.02}
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 421/500 [01:51<00:19, 4.15it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 422/500 [01:51<00:18, 4.17it/s] {'loss': 0.0, 'learning_rate': 1.56e-05, 'epoch': 0.02}
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 422/500 [01:51<00:18, 4.17it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 423/500 [01:51<00:18, 4.21it/s] {'loss': 0.0, 'learning_rate': 1.54e-05, 'epoch': 0.02}
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 423/500 [01:51<00:18, 4.21it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 424/500 [01:51<00:17, 4.22it/s] {'loss': 0.0, 'learning_rate': 1.52e-05, 'epoch': 0.02}
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 424/500 [01:51<00:17, 4.22it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 425/500 [01:52<00:17, 4.21it/s] {'loss': 0.0, 'learning_rate': 1.5e-05, 'epoch': 0.02}
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 425/500 [01:52<00:17, 4.21it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 426/500 [01:52<00:17, 4.22it/s] {'loss': 0.0, 'learning_rate': 1.48e-05, 'epoch': 0.02}
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 426/500 [01:52<00:17, 4.22it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 427/500 [01:52<00:17, 4.23it/s] {'loss': 0.0, 'learning_rate': 1.4599999999999999e-05, 'epoch': 0.02}
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 427/500 [01:52<00:17, 4.23it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 428/500 [01:52<00:16, 4.25it/s] {'loss': 0.0, 'learning_rate': 1.44e-05, 'epoch': 0.02}
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 428/500 [01:52<00:16, 4.25it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 429/500 [01:52<00:16, 4.25it/s] {'loss': 0.0, 'learning_rate': 1.42e-05, 'epoch': 0.02}
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 429/500 [01:52<00:16, 4.25it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 430/500 [01:53<00:16, 4.24it/s] {'loss': 0.0, 'learning_rate': 1.4000000000000001e-05, 'epoch': 0.02}
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 430/500 [01:53<00:16, 4.24it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 431/500 [01:53<00:16, 4.27it/s] {'loss': 0.0, 'learning_rate': 1.3800000000000002e-05, 'epoch': 0.02}
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 431/500 [01:53<00:16, 4.27it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 432/500 [01:53<00:16, 4.25it/s] {'loss': 0.0, 'learning_rate': 1.3600000000000002e-05, 'epoch': 0.02}
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 432/500 [01:53<00:16, 4.25it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 433/500 [01:53<00:15, 4.25it/s] {'loss': 0.0, 'learning_rate': 1.3400000000000002e-05, 'epoch': 0.02}
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 433/500 [01:53<00:15, 4.25it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 434/500 [01:54<00:15, 4.25it/s] {'loss': 0.0, 'learning_rate': 1.32e-05, 'epoch': 0.02}
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 434/500 [01:54<00:15, 4.25it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 435/500 [01:54<00:15, 4.23it/s] {'loss': 0.0, 'learning_rate': 1.3000000000000001e-05, 'epoch': 0.02}
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 435/500 [01:54<00:15, 4.23it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 436/500 [01:54<00:15, 4.24it/s] {'loss': 0.0, 'learning_rate': 1.2800000000000001e-05, 'epoch': 0.02}
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 436/500 [01:54<00:15, 4.24it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 437/500 [01:54<00:14, 4.24it/s] {'loss': 0.0, 'learning_rate': 1.2600000000000001e-05, 'epoch': 0.02}
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 437/500 [01:54<00:14, 4.24it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 438/500 [01:55<00:14, 4.25it/s] {'loss': 0.0, 'learning_rate': 1.24e-05, 'epoch': 0.02}
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 438/500 [01:55<00:14, 4.25it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 439/500 [01:55<00:14, 4.25it/s] {'loss': 0.0, 'learning_rate': 1.22e-05, 'epoch': 0.02}
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 439/500 [01:55<00:14, 4.25it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 440/500 [01:55<00:14, 4.27it/s] {'loss': 0.0, 'learning_rate': 1.2e-05, 'epoch': 0.02}
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 440/500 [01:55<00:14, 4.27it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 441/500 [01:55<00:13, 4.28it/s] {'loss': 0.0, 'learning_rate': 1.18e-05, 'epoch': 0.02}
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 441/500 [01:55<00:13, 4.28it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 442/500 [01:56<00:13, 4.27it/s] {'loss': 0.0, 'learning_rate': 1.16e-05, 'epoch': 0.02}
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 442/500 [01:56<00:13, 4.27it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 443/500 [01:56<00:13, 4.26it/s] {'loss': 0.0, 'learning_rate': 1.1400000000000001e-05, 'epoch': 0.02}
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 443/500 [01:56<00:13, 4.26it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 444/500 [01:56<00:13, 4.26it/s] {'loss': 0.0, 'learning_rate': 1.1200000000000001e-05, 'epoch': 0.02}
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 444/500 [01:56<00:13, 4.26it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 445/500 [01:56<00:12, 4.26it/s] {'loss': 0.0, 'learning_rate': 1.1000000000000001e-05, 'epoch': 0.02}
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 445/500 [01:56<00:12, 4.26it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 446/500 [01:56<00:12, 4.27it/s] {'loss': 0.0, 'learning_rate': 1.08e-05, 'epoch': 0.02}
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 446/500 [01:56<00:12, 4.27it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 447/500 [01:57<00:12, 4.25it/s] {'loss': 0.0, 'learning_rate': 1.06e-05, 'epoch': 0.02}
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 447/500 [01:57<00:12, 4.25it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 448/500 [01:57<00:12, 4.26it/s] {'loss': 0.0, 'learning_rate': 1.04e-05, 'epoch': 0.02}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 448/500 [01:57<00:12, 4.26it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 449/500 [01:57<00:11, 4.28it/s] {'loss': 0.0, 'learning_rate': 1.02e-05, 'epoch': 0.02}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 449/500 [01:57<00:11, 4.28it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 450/500 [01:57<00:11, 4.26it/s] {'loss': 0.0, 'learning_rate': 1e-05, 'epoch': 0.02}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 450/500 [01:57<00:11, 4.26it/s][INFO|tokenization_utils_base.py:2428] 2023-12-09 15:29:21,641 >> tokenizer config file saved in output/text-20231209-152643-1e-4/checkpoint-450/tokenizer_config.json
[INFO|tokenization_utils_base.py:2437] 2023-12-09 15:29:21,641 >> Special tokens file saved in output/text-20231209-152643-1e-4/checkpoint-450/special_tokens_map.json
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 451/500 [01:58<00:12, 4.00it/s] {'loss': 0.0, 'learning_rate': 9.800000000000001e-06, 'epoch': 0.02}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 451/500 [01:58<00:12, 4.00it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 452/500 [01:58<00:11, 4.09it/s] {'loss': 0.0, 'learning_rate': 9.600000000000001e-06, 'epoch': 0.02}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 452/500 [01:58<00:11, 4.09it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 453/500 [01:58<00:11, 4.17it/s] {'loss': 0.0, 'learning_rate': 9.4e-06, 'epoch': 0.02}
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 453/500 [01:58<00:11, 4.17it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 454/500 [01:58<00:10, 4.19it/s] {'loss': 0.0, 'learning_rate': 9.2e-06, 'epoch': 0.02}
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 454/500 [01:58<00:10, 4.19it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 455/500 [01:59<00:10, 4.20it/s] {'loss': 0.0, 'learning_rate': 9e-06, 'epoch': 0.02}
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 455/500 [01:59<00:10, 4.20it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 456/500 [01:59<00:10, 4.23it/s] {'loss': 0.0, 'learning_rate': 8.8e-06, 'epoch': 0.02}
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 456/500 [01:59<00:10, 4.23it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 457/500 [01:59<00:10, 4.24it/s] {'loss': 0.0, 'learning_rate': 8.599999999999999e-06, 'epoch': 0.02}
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 457/500 [01:59<00:10, 4.24it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 458/500 [01:59<00:09, 4.26it/s] {'loss': 0.0, 'learning_rate': 8.400000000000001e-06, 'epoch': 0.02}
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 458/500 [01:59<00:09, 4.26it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 459/500 [02:00<00:09, 4.29it/s] {'loss': 0.0, 'learning_rate': 8.200000000000001e-06, 'epoch': 0.02}
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 459/500 [02:00<00:09, 4.29it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 460/500 [02:00<00:09, 4.29it/s] {'loss': 0.0, 'learning_rate': 8.000000000000001e-06, 'epoch': 0.02}
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 460/500 [02:00<00:09, 4.29it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 461/500 [02:00<00:09, 4.27it/s] {'loss': 0.0, 'learning_rate': 7.8e-06, 'epoch': 0.02}
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 461/500 [02:00<00:09, 4.27it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 462/500 [02:00<00:08, 4.28it/s] {'loss': 0.0, 'learning_rate': 7.6e-06, 'epoch': 0.02}
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 462/500 [02:00<00:08, 4.28it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 463/500 [02:00<00:08, 4.27it/s] {'loss': 0.0, 'learning_rate': 7.4e-06, 'epoch': 0.02}
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 463/500 [02:00<00:08, 4.27it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 464/500 [02:01<00:08, 4.26it/s] {'loss': 0.0, 'learning_rate': 7.2e-06, 'epoch': 0.02}
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 464/500 [02:01<00:08, 4.26it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 465/500 [02:01<00:08, 4.26it/s] {'loss': 0.0, 'learning_rate': 7.000000000000001e-06, 'epoch': 0.02}
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 465/500 [02:01<00:08, 4.26it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 466/500 [02:01<00:07, 4.26it/s] {'loss': 0.0, 'learning_rate': 6.800000000000001e-06, 'epoch': 0.02}
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 466/500 [02:01<00:07, 4.26it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 467/500 [02:01<00:07, 4.27it/s] {'loss': 0.0, 'learning_rate': 6.6e-06, 'epoch': 0.02}
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 467/500 [02:01<00:07, 4.27it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 468/500 [02:02<00:07, 4.26it/s] {'loss': 0.0, 'learning_rate': 6.4000000000000006e-06, 'epoch': 0.02}
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 468/500 [02:02<00:07, 4.26it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 469/500 [02:02<00:07, 4.26it/s] {'loss': 0.0, 'learning_rate': 6.2e-06, 'epoch': 0.02}
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 469/500 [02:02<00:07, 4.26it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 470/500 [02:02<00:07, 4.26it/s] {'loss': 0.0, 'learning_rate': 6e-06, 'epoch': 0.02}
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 470/500 [02:02<00:07, 4.26it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 471/500 [02:02<00:06, 4.26it/s] {'loss': 0.0, 'learning_rate': 5.8e-06, 'epoch': 0.02}
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 471/500 [02:02<00:06, 4.26it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 472/500 [02:03<00:06, 4.27it/s] {'loss': 0.0, 'learning_rate': 5.600000000000001e-06, 'epoch': 0.02}
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 472/500 [02:03<00:06, 4.27it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 473/500 [02:03<00:06, 4.27it/s] {'loss': 0.0, 'learning_rate': 5.4e-06, 'epoch': 0.02}
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 473/500 [02:03<00:06, 4.27it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 474/500 [02:03<00:06, 4.28it/s] {'loss': 0.0, 'learning_rate': 5.2e-06, 'epoch': 0.02}
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 474/500 [02:03<00:06, 4.28it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 475/500 [02:03<00:05, 4.27it/s] {'loss': 0.0, 'learning_rate': 5e-06, 'epoch': 0.02}
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 475/500 [02:03<00:05, 4.27it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 476/500 [02:04<00:05, 4.27it/s] {'loss': 0.0, 'learning_rate': 4.800000000000001e-06, 'epoch': 0.02}
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 476/500 [02:04<00:05, 4.27it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 477/500 [02:04<00:05, 4.27it/s] {'loss': 0.0, 'learning_rate': 4.6e-06, 'epoch': 0.02}
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 477/500 [02:04<00:05, 4.27it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 478/500 [02:04<00:05, 4.28it/s] {'loss': 0.0, 'learning_rate': 4.4e-06, 'epoch': 0.02}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 478/500 [02:04<00:05, 4.28it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 479/500 [02:04<00:04, 4.28it/s] {'loss': 0.0, 'learning_rate': 4.2000000000000004e-06, 'epoch': 0.02}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 479/500 [02:04<00:04, 4.28it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 480/500 [02:04<00:04, 4.27it/s] {'loss': 0.0, 'learning_rate': 4.000000000000001e-06, 'epoch': 0.02}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 480/500 [02:04<00:04, 4.27it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 481/500 [02:05<00:04, 4.26it/s] {'loss': 0.0, 'learning_rate': 3.8e-06, 'epoch': 0.02}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 481/500 [02:05<00:04, 4.26it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 482/500 [02:05<00:04, 4.27it/s] {'loss': 0.0, 'learning_rate': 3.6e-06, 'epoch': 0.02}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 482/500 [02:05<00:04, 4.27it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 483/500 [02:05<00:03, 4.27it/s] {'loss': 0.0, 'learning_rate': 3.4000000000000005e-06, 'epoch': 0.02}
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 483/500 [02:05<00:03, 4.27it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 484/500 [02:05<00:03, 4.27it/s] {'loss': 0.0, 'learning_rate': 3.2000000000000003e-06, 'epoch': 0.02}
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 484/500 [02:05<00:03, 4.27it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 485/500 [02:06<00:03, 4.23it/s] {'loss': 0.0, 'learning_rate': 3e-06, 'epoch': 0.02}
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 485/500 [02:06<00:03, 4.23it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 486/500 [02:06<00:03, 4.20it/s] {'loss': 0.0, 'learning_rate': 2.8000000000000003e-06, 'epoch': 0.02}
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 486/500 [02:06<00:03, 4.20it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 487/500 [02:06<00:03, 4.20it/s] {'loss': 0.0, 'learning_rate': 2.6e-06, 'epoch': 0.02}
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 487/500 [02:06<00:03, 4.20it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 488/500 [02:06<00:02, 4.20it/s] {'loss': 0.0, 'learning_rate': 2.4000000000000003e-06, 'epoch': 0.02}
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 488/500 [02:06<00:02, 4.20it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 489/500 [02:07<00:02, 4.19it/s] {'loss': 0.0, 'learning_rate': 2.2e-06, 'epoch': 0.02}
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 489/500 [02:07<00:02, 4.19it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 490/500 [02:07<00:02, 4.20it/s] {'loss': 0.0, 'learning_rate': 2.0000000000000003e-06, 'epoch': 0.02}
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 490/500 [02:07<00:02, 4.20it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 491/500 [02:07<00:02, 4.18it/s] {'loss': 0.0, 'learning_rate': 1.8e-06, 'epoch': 0.02}
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 491/500 [02:07<00:02, 4.18it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 492/500 [02:07<00:01, 4.18it/s] {'loss': 0.0, 'learning_rate': 1.6000000000000001e-06, 'epoch': 0.02}
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 492/500 [02:07<00:01, 4.18it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 493/500 [02:08<00:01, 4.19it/s] {'loss': 0.0, 'learning_rate': 1.4000000000000001e-06, 'epoch': 0.02}
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 493/500 [02:08<00:01, 4.19it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 494/500 [02:08<00:01, 4.18it/s] {'loss': 0.0, 'learning_rate': 1.2000000000000002e-06, 'epoch': 0.02}
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 494/500 [02:08<00:01, 4.18it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 495/500 [02:08<00:01, 4.18it/s] {'loss': 0.0, 'learning_rate': 1.0000000000000002e-06, 'epoch': 0.02}
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 495/500 [02:08<00:01, 4.18it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 496/500 [02:08<00:00, 4.17it/s] {'loss': 0.0, 'learning_rate': 8.000000000000001e-07, 'epoch': 0.02}
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 496/500 [02:08<00:00, 4.17it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 497/500 [02:09<00:00, 4.19it/s] {'loss': 0.0, 'learning_rate': 6.000000000000001e-07, 'epoch': 0.02}
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 497/500 [02:09<00:00, 4.19it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 498/500 [02:09<00:00, 4.20it/s] {'loss': 0.0, 'learning_rate': 4.0000000000000003e-07, 'epoch': 0.02}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 498/500 [02:09<00:00, 4.20it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 499/500 [02:09<00:00, 4.19it/s] {'loss': 0.0, 'learning_rate': 2.0000000000000002e-07, 'epoch': 0.02}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 499/500 [02:09<00:00, 4.19it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 500/500 [02:09<00:00, 4.20it/s] {'loss': 0.0, 'learning_rate': 0.0, 'epoch': 0.02}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 500/500 [02:09<00:00, 4.20it/s][INFO|tokenization_utils_base.py:2428] 2023-12-09 15:29:33,483 >> tokenizer config file saved in output/text-20231209-152643-1e-4/checkpoint-500/tokenizer_config.json
[INFO|tokenization_utils_base.py:2437] 2023-12-09 15:29:33,483 >> Special tokens file saved in output/text-20231209-152643-1e-4/checkpoint-500/special_tokens_map.json
[INFO|trainer.py:1955] 2023-12-09 15:29:33,536 >>
Training completed. Do not forget to share your model on huggingface.co/models =)
{'train_runtime': 130.5539, 'train_samples_per_second': 7.66, 'train_steps_per_second': 3.83, 'train_loss': 0.00219873046875, 'epoch': 0.02}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 500/500 [02:09<00:00, 4.20it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 500/500 [02:09<00:00, 3.85it/s]
[INFO|tokenization_utils_base.py:2428] 2023-12-09 15:29:33,559 >> tokenizer config file saved in output/text-20231209-152643-1e-4/tokenizer_config.json
[INFO|tokenization_utils_base.py:2437] 2023-12-09 15:29:33,559 >> Special tokens file saved in output/text-20231209-152643-1e-4/special_tokens_map.json