Why don't the available checkpoints start from a lower amount of steps/seen data?
#6
by
user09180912480
- opened
7B model has checkpoint from 150-1000 steps with only 1B tokens seen. Why is it not the same for the 13B model?