loss grad_norm learning_rate epoch step eval_loss eval_f1_micro eval_f1_macro eval_precision eval_recall eval_runtime eval_samples_per_second eval_steps_per_second train_runtime train_samples_per_second train_steps_per_second total_flos train_loss 0.0316 0.0846693217754364 5.1690821256038647e-05 0.4830917874396135 500 0.0076 0.034613098949193954 3.3816425120772947e-06 0.966183574879227 1000 1.0 1035 0.027310708537697792 0.9825396825396825 0.9808089925902543 0.9880287310454908 0.9771112865035517 8.4322 107.089 6.76 1.0 1035 95.2688 86.912 10.864 7716522048675840.0 0.019097913535320817