## Rigel Pretrained Model

### Dataset

* **Size:** Approximately 2000 hours of speech and vocals.
* **Languages:**
    * English: ~800 hours
    * Spanish: ~200 hours
    * French: ~42 hours
    * Russian: ~188 hours
    * Arabic: ~70 hours
    * Japanese: ~140 hours
    * Chinese (Mandarin): ~70 hours
    * Korean: ~80 hours
    * Hindi: ~30 hours
    * Indonesian: ~53 hours
    * Tagalog: ~30 hours
    * Portuguese: ~40 hours
    * German: ~35 hours
    * Singing (all languages): ~190 hours
    * Common language: Unknown amount

### Sampling Frequency

* **32kHz** (Done)
* **40kHz** (Retraining)

### Models

#### **Base Model**

* **Data:** Approximately 2000 hours of low-mid quality data.
* **Steps:** 3,890,220
* **Batch:** 40-20-2
* **Precision:** FP32
* **Sampling Frequency:** 32kHz

#### **Fine-Tuned Model**

* **Data:** 102 hours of high-quality data.
* **Steps:** 2,854,856
* **Batch:** 20-12-2
* **Precision:** FP32
* **Sampling Frequency:** 32kHz

### Hardware Used

* **CPU:** AMD EPYC 9754
* **RAM:** 256GB
* **GPUs:**
    * 1 x H100
    * 4 x L40s
    * 1 x RTX 4080
    * 1 x RTX 4070 Ti

### Expected Release Date

* July 22nd


I hope this is more helpful! Let me know if you'd like any other adjustments or have any other questions.