Update README.md
Browse files
README.md
CHANGED
@@ -1,76 +1,63 @@
|
|
1 |
-
### Dataset is about ~2000 hours of speech and vocals
|
2 |
-
### Supported (included) languages:
|
3 |
|
4 |
-
~800 hrs of English
|
5 |
|
6 |
-
|
7 |
|
8 |
-
|
9 |
|
10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
|
12 |
-
|
13 |
|
14 |
-
|
|
|
15 |
|
16 |
-
|
17 |
|
18 |
-
|
19 |
|
20 |
-
|
|
|
|
|
|
|
|
|
21 |
|
22 |
-
|
23 |
|
24 |
-
|
|
|
|
|
|
|
|
|
25 |
|
26 |
-
|
27 |
|
28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
|
30 |
-
|
31 |
|
32 |
-
|
33 |
|
34 |
-
# Sampling frequency: 32k(done), 40k(retraining)
|
35 |
-
|
36 |
-
#### Base and Fine tuned (FT) mdoels
|
37 |
-
|
38 |
-
## Base model:
|
39 |
-
data - approximate 2k hrs of low-mid quality data
|
40 |
-
|
41 |
-
steps - 3890220
|
42 |
-
|
43 |
-
batch - 40-20-2
|
44 |
-
|
45 |
-
fp32
|
46 |
-
|
47 |
-
Sampling frequency - 32k
|
48 |
-
|
49 |
-
## Fine Tuned
|
50 |
-
data - 102 hrs of high quality data
|
51 |
-
|
52 |
-
steps - 2854856
|
53 |
-
|
54 |
-
batch - 20-12-2
|
55 |
-
|
56 |
-
fp32
|
57 |
-
|
58 |
-
Sampling frequency - 32k
|
59 |
-
|
60 |
-
|
61 |
-
|
62 |
-
# Hardware used:
|
63 |
-
Cpu - amd epyc 9754
|
64 |
-
|
65 |
-
Ram - 256gb
|
66 |
-
|
67 |
-
Gpu's:
|
68 |
-
|
69 |
-
1 - h100, 4 - L40s
|
70 |
-
|
71 |
-
1 - rtx 4080, 1 - rtx 4070ti
|
72 |
-
|
73 |
-
Expected release date - 22 july
|
74 |
-
|
75 |
-
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65041c19e88eb2d0d521d46c/NfsOJxAzRbllBDCDjFC5e.png)
|
76 |
|
|
|
|
|
|
|
|
1 |
|
|
|
2 |
|
3 |
+
## Rigel Pretrained Model
|
4 |
|
5 |
+
### Dataset
|
6 |
|
7 |
+
* **Size:** Approximately 2000 hours of speech and vocals.
|
8 |
+
* **Languages:**
|
9 |
+
* English: ~800 hours
|
10 |
+
* Spanish: ~200 hours
|
11 |
+
* French: ~42 hours
|
12 |
+
* Russian: ~188 hours
|
13 |
+
* Arabic: ~70 hours
|
14 |
+
* Japanese: ~140 hours
|
15 |
+
* Chinese (Mandarin): ~70 hours
|
16 |
+
* Korean: ~80 hours
|
17 |
+
* Hindi: ~30 hours
|
18 |
+
* Indonesian: ~53 hours
|
19 |
+
* Tagalog: ~30 hours
|
20 |
+
* Portuguese: ~40 hours
|
21 |
+
* German: ~35 hours
|
22 |
+
* Singing (all languages): ~190 hours
|
23 |
+
* Common language: Unknown amount
|
24 |
|
25 |
+
### Sampling Frequency
|
26 |
|
27 |
+
* **32kHz** (Done)
|
28 |
+
* **40kHz** (Retraining)
|
29 |
|
30 |
+
### Models
|
31 |
|
32 |
+
#### **Base Model**
|
33 |
|
34 |
+
* **Data:** Approximately 2000 hours of low-mid quality data.
|
35 |
+
* **Steps:** 3,890,220
|
36 |
+
* **Batch:** 40-20-2
|
37 |
+
* **Precision:** FP32
|
38 |
+
* **Sampling Frequency:** 32kHz
|
39 |
|
40 |
+
#### **Fine-Tuned Model**
|
41 |
|
42 |
+
* **Data:** 102 hours of high-quality data.
|
43 |
+
* **Steps:** 2,854,856
|
44 |
+
* **Batch:** 20-12-2
|
45 |
+
* **Precision:** FP32
|
46 |
+
* **Sampling Frequency:** 32kHz
|
47 |
|
48 |
+
### Hardware Used
|
49 |
|
50 |
+
* **CPU:** AMD EPYC 9754
|
51 |
+
* **RAM:** 256GB
|
52 |
+
* **GPUs:**
|
53 |
+
* 1 x H100
|
54 |
+
* 4 x L40s
|
55 |
+
* 1 x RTX 4080
|
56 |
+
* 1 x RTX 4070 Ti
|
57 |
|
58 |
+
### Expected Release Date
|
59 |
|
60 |
+
* July 22nd
|
61 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
62 |
|
63 |
+
I hope this is more helpful! Let me know if you'd like any other adjustments or have any other questions.
|