File size: 1,264 Bytes
16a9f96
 
578213e
e4ca572
16a9f96
578213e
16a9f96
fc3181a
578213e
 
 
aa1005e
 
 
578213e
 
aa1005e
 
578213e
aa1005e
578213e
aa1005e
 
578213e
16a9f96
578213e
16a9f96
578213e
 
16a9f96
578213e
16a9f96
578213e
16a9f96
d6c419b
578213e
a28cd70
578213e
9ab9c51
16a9f96
578213e
16a9f96
578213e
 
a28cd70
578213e
9ab9c51
16a9f96
578213e
16a9f96
578213e
 
 
 
 
 
 
16a9f96
578213e
16a9f96
578213e
16a9f96
8a8310d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64


## Rigel Pretrained Model
Base and Fine tuned models

### Dataset

* **Size:** Total 1921 hours of speech and vocals.
* **Languages:**
    * Arabic: ~70 hours
    * Chinese (Mandarin): ~70 hours
    * English: ~800 hours
    * French: ~42 hours
    * German: ~35 hours
    * Hindi: ~30 hours
    * Indonesian: ~53 hours
    * Japanese: ~140 hours
    * Korean: ~80 hours
    * Portuguese: ~40 hours
    * Russian: ~188 hours
    * Singing (all languages): ~190 hours
    * Spanish: ~200 hours
    * Tagalog: ~30 hours
    * Common language: Unknown amount

### Sampling Frequency

* **32kHz** (Done)
* **40kHz** (Retraining)

### Models

#### **Base Model**

* **Data:** Total 1921 hours of low-mid quality data.
* **Steps:** 3,890,220
* **Batch:** 40
* **Precision:** FP32
* **Sampling Rate:** 32k

#### **Fine-Tuned Model**

* **Data:** 102 hours of high-quality data.
* **Steps:** 2,854,856
* **Batch:** 20
* **Precision:** FP32
* **Sampling Rate:** 32k

### Hardware Used

* **CPU:** AMD EPYC 9754
* **RAM:** 256GB
* **GPUs:**
    * 1 x H100
    * 4 x L40s
    * 1 x RTX 4080
    * 1 x RTX 4070 Ti

### Expected Release Date

* July 22nd

![image/png](https://cdn-uploads.huggingface.co/production/uploads/65041c19e88eb2d0d521d46c/NfsOJxAzRbllBDCDjFC5e.png)