Commit
·
d0a22d2
1
Parent(s):
0ae50c5
Update README.md
Browse files
README.md
CHANGED
@@ -16,6 +16,12 @@ Since seamless M4t wav2vec2 is trained on multilingual data, it makes this model
|
|
16 |
|
17 |
<img width="993" alt="Screenshot 2023-11-19 at 11 53 52 PM" src="https://github.com/dubverse-ai/MahaTTS/assets/32906806/7429d3b6-3f19-4bd8-9005-ff9e16a698f8">
|
18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
|
20 |
## Features
|
21 |
1. Multilinguality
|
@@ -38,78 +44,3 @@ pip install maha-tts
|
|
38 |
- [ ] Smolie - indic
|
39 |
- [ ] Optimizations for inference
|
40 |
|
41 |
-
## Some Generated Samples
|
42 |
-
text:
|
43 |
-
0 -> "I seriously laughed so much hahahaha (seals with headphones...) and appreciate both the interviewer and the subject. Major respect for two extraordinary humans - and in this time of gratefulness, I'm thankful for you both and this forum!"
|
44 |
-
|
45 |
-
1 -> "I freakin love how Elon came to life the moment they started talking about gaming and specifically diablo, you can tell that he didn't want that part of the discussion to end, while Lex to move on to the next subject! Once a true gamer, always a true gamer!"
|
46 |
-
|
47 |
-
2 -> "hello there! how are you?" (This one didn't work well, M1 model hallucinated)
|
48 |
-
|
49 |
-
3 -> "Who doesn't love a good scary story, something to send a chill across your skin in the middle of summer's heat or really, any other time? And this year, we're celebrating the two hundredth birthday of one of the most famous scary stories of all time: Frankenstein."
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
-
https://github.com/dubverse-ai/MahaTTS/assets/32906806/66fc7a08-3e8a-4d63-a3fa-88bc705a172a
|
54 |
-
|
55 |
-
|
56 |
-
|
57 |
-
https://github.com/dubverse-ai/MahaTTS/assets/32906806/5acf5a4b-aeb8-4f14-94fe-45811868a886
|
58 |
-
|
59 |
-
|
60 |
-
|
61 |
-
https://github.com/dubverse-ai/MahaTTS/assets/32906806/0af2ce6e-4172-4aac-9322-4fd545f1d4ac
|
62 |
-
|
63 |
-
|
64 |
-
|
65 |
-
https://github.com/dubverse-ai/MahaTTS/assets/32906806/2d5b0335-d1fc-473a-aea8-c5bb6afbce27
|
66 |
-
|
67 |
-
|
68 |
-
|
69 |
-
https://github.com/dubverse-ai/MahaTTS/assets/32906806/a63ba39f-a261-4fe6-8d06-a172a993acc1
|
70 |
-
|
71 |
-
|
72 |
-
|
73 |
-
https://github.com/dubverse-ai/MahaTTS/assets/32906806/4355f633-9b27-4290-a284-96d650f5f4b8
|
74 |
-
|
75 |
-
|
76 |
-
|
77 |
-
https://github.com/dubverse-ai/MahaTTS/assets/32906806/7c93d81e-02bc-4819-a97b-d48e39ec5689
|
78 |
-
|
79 |
-
|
80 |
-
|
81 |
-
https://github.com/dubverse-ai/MahaTTS/assets/32906806/63456535-0b38-429a-a8a0-686cfb6a92c5
|
82 |
-
|
83 |
-
|
84 |
-
|
85 |
-
https://github.com/dubverse-ai/MahaTTS/assets/32906806/960aa78c-888f-4f0b-a380-145a87f65a99
|
86 |
-
|
87 |
-
|
88 |
-
|
89 |
-
https://github.com/dubverse-ai/MahaTTS/assets/32906806/5027f0eb-3601-468b-9dda-6b436b774741
|
90 |
-
|
91 |
-
|
92 |
-
|
93 |
-
https://github.com/dubverse-ai/MahaTTS/assets/32906806/266285e0-a8f3-4784-81dc-f98b0a9c9373
|
94 |
-
|
95 |
-
|
96 |
-
|
97 |
-
https://github.com/dubverse-ai/MahaTTS/assets/32906806/68ba18d6-430b-41e7-84e5-e15990064836
|
98 |
-
|
99 |
-
|
100 |
-
|
101 |
-
https://github.com/dubverse-ai/MahaTTS/assets/32906806/0f7321a7-efb1-407c-8b8c-69e812865739
|
102 |
-
|
103 |
-
|
104 |
-
|
105 |
-
https://github.com/dubverse-ai/MahaTTS/assets/32906806/dcedffe6-d81b-4eff-95c0-cbd00279fdb7
|
106 |
-
|
107 |
-
|
108 |
-
|
109 |
-
https://github.com/dubverse-ai/MahaTTS/assets/32906806/8050db3e-7acb-44be-a039-7e0b9e6a9905
|
110 |
-
|
111 |
-
|
112 |
-
|
113 |
-
https://github.com/dubverse-ai/MahaTTS/assets/32906806/6486af1c-2e14-420b-8419-bf5e01fe49a5
|
114 |
-
|
115 |
-
|
|
|
16 |
|
17 |
<img width="993" alt="Screenshot 2023-11-19 at 11 53 52 PM" src="https://github.com/dubverse-ai/MahaTTS/assets/32906806/7429d3b6-3f19-4bd8-9005-ff9e16a698f8">
|
18 |
|
19 |
+
### Architecture
|
20 |
+
| Model (Smolie) | Parameters | Model Type | Output |
|
21 |
+
|:-------------------------:|:----------:|------------|:-----------------:|
|
22 |
+
| Text to Semantic (M1) | 69 M | Causal LM | 10,001 Tokens |
|
23 |
+
| Semantic to MelSpec(M2) | 108 M | Diffusion | 2x 80x Melspec |
|
24 |
+
| Hifi Gan Vocoder | 13 M | GAN | Audio Waveform |
|
25 |
|
26 |
## Features
|
27 |
1. Multilinguality
|
|
|
44 |
- [ ] Smolie - indic
|
45 |
- [ ] Optimizations for inference
|
46 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|