Warmstarted from the "Chills" single-speaker male model (not available on HF as of right now), then trained for 25 (de facto 50) epochs. Batch size 16, learning rate (√2)e-3 for the first 15(?) epochs and (5√2)e-4 for the next 10.
Dataset: NST Norwegian Speech Synthesis (CC0), augmented like so:
- Make a copy of the dataset.
- Join the two shortest clips of the copy with 100ms of silence between them, then replace them with the joined version. Repeat until the shortest clip is at least 6 seconds long.
- Shuffle the original together with the copy.
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.