MUSTAR's picture
Update README.md
fc3181a verified
|
raw
history blame
1.26 kB
## Rigel Pretrained Model
Base and Fine tuned models
### Dataset
* **Size:** Total 1921 hours of speech and vocals.
* **Languages:**
* Arabic: ~70 hours
* Chinese (Mandarin): ~70 hours
* English: ~800 hours
* French: ~42 hours
* German: ~35 hours
* Hindi: ~30 hours
* Indonesian: ~53 hours
* Japanese: ~140 hours
* Korean: ~80 hours
* Portuguese: ~40 hours
* Russian: ~188 hours
* Singing (all languages): ~190 hours
* Spanish: ~200 hours
* Tagalog: ~30 hours
* Common language: Unknown amount
### Sampling Frequency
* **32kHz** (Done)
* **40kHz** (Retraining)
### Models
#### **Base Model**
* **Data:** Total 1921 hours of low-mid quality data.
* **Steps:** 3,890,220
* **Batch:** 40
* **Precision:** FP32
* **Sampling Rate:** 32k
#### **Fine-Tuned Model**
* **Data:** 102 hours of high-quality data.
* **Steps:** 2,854,856
* **Batch:** 20
* **Precision:** FP32
* **Sampling Rate:** 32k
### Hardware Used
* **CPU:** AMD EPYC 9754
* **RAM:** 256GB
* **GPUs:**
* 1 x H100
* 4 x L40s
* 1 x RTX 4080
* 1 x RTX 4070 Ti
### Expected Release Date
* July 22nd
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65041c19e88eb2d0d521d46c/NfsOJxAzRbllBDCDjFC5e.png)