Hyperparameter Value
Steps 150k
Max length 256
LR 1e-4
LR schedule constant
Optimizer AdamW
beta_1, beta_2 0.9, 0.95
Final eval loss 2.245
Final eval perplexity 9.44
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Dataset used to train bri25yu/t5like-60M