MUSTAR commited on
Commit
578213e
·
verified ·
1 Parent(s): 02c2343

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -59
README.md CHANGED
@@ -1,76 +1,63 @@
1
- ### Dataset is about ~2000 hours of speech and vocals
2
- ### Supported (included) languages:
3
 
4
- ~800 hrs of English
5
 
6
- ~200 Spanish
7
 
8
- ~42 French
9
 
10
- ~188 Russian
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
- ~70 Arabic
13
 
14
- ~140 Japanese
 
15
 
16
- ~70 Chinese (Mandarin)
17
 
18
- ~80 Korean
19
 
20
- ~30 Hindi
 
 
 
 
21
 
22
- ~53 Indonesian
23
 
24
- ~30 Tagalog
 
 
 
 
25
 
26
- ~40 Portuguese
27
 
28
- ~35 German
 
 
 
 
 
 
29
 
30
- ~190 singing (all languages)
31
 
32
- common language (I don't remember how much data was there)
33
 
34
- # Sampling frequency: 32k(done), 40k(retraining)
35
-
36
- #### Base and Fine tuned (FT) mdoels
37
-
38
- ## Base model:
39
- data - approximate 2k hrs of low-mid quality data
40
-
41
- steps - 3890220
42
-
43
- batch - 40-20-2
44
-
45
- fp32
46
-
47
- Sampling frequency - 32k
48
-
49
- ## Fine Tuned
50
- data - 102 hrs of high quality data
51
-
52
- steps - 2854856
53
-
54
- batch - 20-12-2
55
-
56
- fp32
57
-
58
- Sampling frequency - 32k
59
-
60
-
61
-
62
- # Hardware used:
63
- Cpu - amd epyc 9754
64
-
65
- Ram - 256gb
66
-
67
- Gpu's:
68
-
69
- 1 - h100, 4 - L40s
70
-
71
- 1 - rtx 4080, 1 - rtx 4070ti
72
-
73
- Expected release date - 22 july
74
-
75
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65041c19e88eb2d0d521d46c/NfsOJxAzRbllBDCDjFC5e.png)
76
 
 
 
 
 
1
 
 
2
 
3
+ ## Rigel Pretrained Model
4
 
5
+ ### Dataset
6
 
7
+ * **Size:** Approximately 2000 hours of speech and vocals.
8
+ * **Languages:**
9
+ * English: ~800 hours
10
+ * Spanish: ~200 hours
11
+ * French: ~42 hours
12
+ * Russian: ~188 hours
13
+ * Arabic: ~70 hours
14
+ * Japanese: ~140 hours
15
+ * Chinese (Mandarin): ~70 hours
16
+ * Korean: ~80 hours
17
+ * Hindi: ~30 hours
18
+ * Indonesian: ~53 hours
19
+ * Tagalog: ~30 hours
20
+ * Portuguese: ~40 hours
21
+ * German: ~35 hours
22
+ * Singing (all languages): ~190 hours
23
+ * Common language: Unknown amount
24
 
25
+ ### Sampling Frequency
26
 
27
+ * **32kHz** (Done)
28
+ * **40kHz** (Retraining)
29
 
30
+ ### Models
31
 
32
+ #### **Base Model**
33
 
34
+ * **Data:** Approximately 2000 hours of low-mid quality data.
35
+ * **Steps:** 3,890,220
36
+ * **Batch:** 40-20-2
37
+ * **Precision:** FP32
38
+ * **Sampling Frequency:** 32kHz
39
 
40
+ #### **Fine-Tuned Model**
41
 
42
+ * **Data:** 102 hours of high-quality data.
43
+ * **Steps:** 2,854,856
44
+ * **Batch:** 20-12-2
45
+ * **Precision:** FP32
46
+ * **Sampling Frequency:** 32kHz
47
 
48
+ ### Hardware Used
49
 
50
+ * **CPU:** AMD EPYC 9754
51
+ * **RAM:** 256GB
52
+ * **GPUs:**
53
+ * 1 x H100
54
+ * 4 x L40s
55
+ * 1 x RTX 4080
56
+ * 1 x RTX 4070 Ti
57
 
58
+ ### Expected Release Date
59
 
60
+ * July 22nd
61
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
 
63
+ I hope this is more helpful! Let me know if you'd like any other adjustments or have any other questions.