mamba_0_5_distill / train_results.json
Junxiong Wang
add models
083ba79
raw
history blame contribute delete
240 Bytes
{
"epoch": 1.0,
"total_flos": 8.846878004006093e+16,
"train_loss": 25.182933448444604,
"train_runtime": 193152.484,
"train_samples": 19473081,
"train_samples_per_second": 21.016,
"train_steps_per_second": 0.328
}