Loss Function Configuration
Noob Here.
Fine Tuning and Both the Train AND Validation Loss Drop to Zero Extremely Quickly (within ~100 steps).
What Loss Function is the default inside this model? I have been trying to fine tune it , and the loss very quickly falls to zero (within one epoch or so) whilst the validation metrics improve and then flatline.
I am concerned that the loss function may well suited to my Metric (Rouge L)?
Fine Tuning On Synthetically generated Chart-Table Pairs of Data which would explain why both validation and training loss drop to zero so quickly as they learn the distribution very easily, though does not explain why the Rouge Metric does not also Reach 1 When the Loss is pretty much zero for a handful of epochs.
Was Using Adam Optimizer with PyTorch Lightning, as When I tried to use AdaFactor with cosine scheduler I got a tonne of errors.
Am I getting stuck in a Local Minimum? And Should I work harder on integrating AdaFactor and Cosine Scheduling?