## Rigel Pretrained Model ### Dataset * **Size:** Approximately 2000 hours of speech and vocals. * **Languages:** * English: ~800 hours * Spanish: ~200 hours * French: ~42 hours * Russian: ~188 hours * Arabic: ~70 hours * Japanese: ~140 hours * Chinese (Mandarin): ~70 hours * Korean: ~80 hours * Hindi: ~30 hours * Indonesian: ~53 hours * Tagalog: ~30 hours * Portuguese: ~40 hours * German: ~35 hours * Singing (all languages): ~190 hours * Common language: Unknown amount ### Sampling Frequency * **32kHz** (Done) * **40kHz** (Retraining) ### Models #### **Base Model** * **Data:** Approximately 2000 hours of low-mid quality data. * **Steps:** 3,890,220 * **Batch:** 40-20-2 * **Precision:** FP32 * **Sampling Frequency:** 32kHz #### **Fine-Tuned Model** * **Data:** 102 hours of high-quality data. * **Steps:** 2,854,856 * **Batch:** 20-12-2 * **Precision:** FP32 * **Sampling Frequency:** 32kHz ### Hardware Used * **CPU:** AMD EPYC 9754 * **RAM:** 256GB * **GPUs:** * 1 x H100 * 4 x L40s * 1 x RTX 4080 * 1 x RTX 4070 Ti ### Expected Release Date * July 22nd I hope this is more helpful! Let me know if you'd like any other adjustments or have any other questions.