File size: 1,568 Bytes
4b966a1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
## Fine-tuning run 2

Tried to improve model fine-tuned during run 1.

Checkpoint used: checkpoint-12000

* Learning rate picked for fine-tuning in run 2 turned out to be too small. 
  WER did not improve compared to run 1.
* Fine-tuning during run 2 followed WER trajectory of the end of run 1:
  from checkpoint-8000 - checkpoint-10000
* Have stopped run 2 after 3000 steps
* do not upload checkpoints from that run
* uploading training stdout logs and tensorboard logs

## Advices

* For the next fine-tuning it's better to use higher Learning Rates.
As for LR Scheduler it's better to:
  * either use a constant Learning Rate Scheduler
  * or manually instantiate a LinearSchedulerWithWarmups and set `num_training_steps` to be larger
    than the actual number of optimization in the run, so that LR in the end would be >> 0 (much larger than 0)
* need to use `seed` other than the one used during run 1. e.g. `seed=43`<br>
  actual seed used during train dataset reshuffling is computed as:
  `train_dataloader.dataset.set_epoch(train_dataloader.dataset._epoch + 1)`
  however, when resuming training `train_dataloader.dataset._epoch` is reset to 0.<br>
  thus need to provide different seed
* can use original Mozilla Common Voice dataset instead of a HuggingFace's one.<br>
  the reason is that original contains multiple voicings of same sentence - 
  so there is at least twice as more data.<br>
  to use this "additional" data, train, validation, test sets need to be enlarged using `validated` set - 
  the one that is absent in HuggingFace's CV11 dataset