File size: 1,319 Bytes
e6570a6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ed29fed
 
 
 
e6570a6
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
## Fine-tuning run 3

Tried to improve model fine-tuned during run 1.

Checkpoint used: checkpoint-12000

* Trained for 6000 steps
* Used custom Learning Rate scheduler initialized in: `custom_trainer.Seq2SeqTrainerCustomLinearScheduler`:
  * `--learning_rate="3e-5"`
	* `--learning_rate_end="1e-5"`
* no warmup was used
* no WER improvements compared to checkpoint-12000 of run 1
* using `seed=43`
* do not upload checkpoints from that run
* uploading src, logs, tensorboard logs, trainer_state

## Advices
* I guess, we need to use warmup when resuming training and increasing LR compared to the last LR in previous run
* need to set number of steps > 6000. because model improved WER veeery slowly
* probably need to load `optimizer.pt` and `scaler.pt` from checkpoint before resuming training.
  otherwise, I guess, we
  * reinitialize optimizer and loose history of parameters momentum (exponential weighted average)
  * scale loss incorrectly
* can use original Mozilla Common Voice dataset instead of a HuggingFace's one.<br>
  the reason is that original contains multiple voicings of same sentence - 
  so there is at least twice as more data.<br>
  to use this "additional" data, train, validation, test sets need to be enlarged using `validated` set - 
  the one that is absent in HuggingFace's CV11 dataset