ales's picture
upd run_3/readme.md
ed29fed
|
raw
history blame
1.32 kB

Fine-tuning run 3

Tried to improve model fine-tuned during run 1.

Checkpoint used: checkpoint-12000

  • Trained for 6000 steps
  • Used custom Learning Rate scheduler initialized in: custom_trainer.Seq2SeqTrainerCustomLinearScheduler:
    • --learning_rate="3e-5"
      • --learning_rate_end="1e-5"
  • no warmup was used
  • no WER improvements compared to checkpoint-12000 of run 1
  • using seed=43
  • do not upload checkpoints from that run
  • uploading src, logs, tensorboard logs, trainer_state

Advices

  • I guess, we need to use warmup when resuming training and increasing LR compared to the last LR in previous run
  • need to set number of steps > 6000. because model improved WER veeery slowly
  • probably need to load optimizer.pt and scaler.pt from checkpoint before resuming training. otherwise, I guess, we
    • reinitialize optimizer and loose history of parameters momentum (exponential weighted average)
    • scale loss incorrectly
  • can use original Mozilla Common Voice dataset instead of a HuggingFace's one.
    the reason is that original contains multiple voicings of same sentence - so there is at least twice as more data.
    to use this "additional" data, train, validation, test sets need to be enlarged using validated set - the one that is absent in HuggingFace's CV11 dataset