## Fine-tuning run 3 Tried to improve model fine-tuned during run 1. Checkpoint used: checkpoint-12000 * Trained for 6000 steps * Used custom Learning Rate scheduler initialized in: `custom_trainer.Seq2SeqTrainerCustomLinearScheduler`: * `--learning_rate="3e-5"` * `--learning_rate_end="1e-5"` * no warmup was used * no WER improvements compared to checkpoint-12000 of run 1 * using `seed=43` * do not upload checkpoints from that run * uploading src, logs, tensorboard logs, trainer_state ## Advices * I guess, we need to use warmup when resuming training and increasing LR compared to the last LR in previous run * need to set number of steps > 6000. because model improved WER veeery slowly * probably need to load `optimizer.pt` and `scaler.pt` from checkpoint before resuming training. otherwise, I guess, we * reinitialize optimizer and loose history of parameters momentum (exponential weighted average) * scale loss incorrectly * can use original Mozilla Common Voice dataset instead of a HuggingFace's one.
the reason is that original contains multiple voicings of same sentence - so there is at least twice as more data.
to use this "additional" data, train, validation, test sets need to be enlarged using `validated` set - the one that is absent in HuggingFace's CV11 dataset