Fine-tuning run 3
Tried to improve model fine-tuned during run 1.
Checkpoint used: checkpoint-12000
- Trained for 6000 steps
- Used custom Learning Rate scheduler initialized in:
custom_trainer.Seq2SeqTrainerCustomLinearScheduler
:--learning_rate="3e-5"
--learning_rate_end="1e-5"
- no warmup was used
- no WER improvements compared to checkpoint-12000 of run 1
- using
seed=43
- do not upload checkpoints from that run
- uploading src, logs, tensorboard logs, trainer_state
Advices
- I guess, we need to use warmup when resuming training and increasing LR compared to the last LR in previous run
- need to set number of steps > 6000. because model improved WER veeery slowly
- probably need to load
optimizer.pt
andscaler.pt
from checkpoint before resuming training. otherwise, I guess, we- reinitialize optimizer and loose history of parameters momentum (exponential weighted average)
- scale loss incorrectly
- can use original Mozilla Common Voice dataset instead of a HuggingFace's one.
the reason is that original contains multiple voicings of same sentence - so there is at least twice as more data.
to use this "additional" data, train, validation, test sets need to be enlarged usingvalidated
set - the one that is absent in HuggingFace's CV11 dataset