Text-to-Speech
Transformers
audio
rasenganai commited on
Commit
d0a22d2
·
1 Parent(s): 0ae50c5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -75
README.md CHANGED
@@ -16,6 +16,12 @@ Since seamless M4t wav2vec2 is trained on multilingual data, it makes this model
16
 
17
  <img width="993" alt="Screenshot 2023-11-19 at 11 53 52 PM" src="https://github.com/dubverse-ai/MahaTTS/assets/32906806/7429d3b6-3f19-4bd8-9005-ff9e16a698f8">
18
 
 
 
 
 
 
 
19
 
20
  ## Features
21
  1. Multilinguality
@@ -38,78 +44,3 @@ pip install maha-tts
38
  - [ ] Smolie - indic
39
  - [ ] Optimizations for inference
40
 
41
- ## Some Generated Samples
42
- text:
43
- 0 -> "I seriously laughed so much hahahaha (seals with headphones...) and appreciate both the interviewer and the subject. Major respect for two extraordinary humans - and in this time of gratefulness, I'm thankful for you both and this forum!"
44
-
45
- 1 -> "I freakin love how Elon came to life the moment they started talking about gaming and specifically diablo, you can tell that he didn't want that part of the discussion to end, while Lex to move on to the next subject! Once a true gamer, always a true gamer!"
46
-
47
- 2 -> "hello there! how are you?" (This one didn't work well, M1 model hallucinated)
48
-
49
- 3 -> "Who doesn't love a good scary story, something to send a chill across your skin in the middle of summer's heat or really, any other time? And this year, we're celebrating the two hundredth birthday of one of the most famous scary stories of all time: Frankenstein."
50
-
51
-
52
-
53
- https://github.com/dubverse-ai/MahaTTS/assets/32906806/66fc7a08-3e8a-4d63-a3fa-88bc705a172a
54
-
55
-
56
-
57
- https://github.com/dubverse-ai/MahaTTS/assets/32906806/5acf5a4b-aeb8-4f14-94fe-45811868a886
58
-
59
-
60
-
61
- https://github.com/dubverse-ai/MahaTTS/assets/32906806/0af2ce6e-4172-4aac-9322-4fd545f1d4ac
62
-
63
-
64
-
65
- https://github.com/dubverse-ai/MahaTTS/assets/32906806/2d5b0335-d1fc-473a-aea8-c5bb6afbce27
66
-
67
-
68
-
69
- https://github.com/dubverse-ai/MahaTTS/assets/32906806/a63ba39f-a261-4fe6-8d06-a172a993acc1
70
-
71
-
72
-
73
- https://github.com/dubverse-ai/MahaTTS/assets/32906806/4355f633-9b27-4290-a284-96d650f5f4b8
74
-
75
-
76
-
77
- https://github.com/dubverse-ai/MahaTTS/assets/32906806/7c93d81e-02bc-4819-a97b-d48e39ec5689
78
-
79
-
80
-
81
- https://github.com/dubverse-ai/MahaTTS/assets/32906806/63456535-0b38-429a-a8a0-686cfb6a92c5
82
-
83
-
84
-
85
- https://github.com/dubverse-ai/MahaTTS/assets/32906806/960aa78c-888f-4f0b-a380-145a87f65a99
86
-
87
-
88
-
89
- https://github.com/dubverse-ai/MahaTTS/assets/32906806/5027f0eb-3601-468b-9dda-6b436b774741
90
-
91
-
92
-
93
- https://github.com/dubverse-ai/MahaTTS/assets/32906806/266285e0-a8f3-4784-81dc-f98b0a9c9373
94
-
95
-
96
-
97
- https://github.com/dubverse-ai/MahaTTS/assets/32906806/68ba18d6-430b-41e7-84e5-e15990064836
98
-
99
-
100
-
101
- https://github.com/dubverse-ai/MahaTTS/assets/32906806/0f7321a7-efb1-407c-8b8c-69e812865739
102
-
103
-
104
-
105
- https://github.com/dubverse-ai/MahaTTS/assets/32906806/dcedffe6-d81b-4eff-95c0-cbd00279fdb7
106
-
107
-
108
-
109
- https://github.com/dubverse-ai/MahaTTS/assets/32906806/8050db3e-7acb-44be-a039-7e0b9e6a9905
110
-
111
-
112
-
113
- https://github.com/dubverse-ai/MahaTTS/assets/32906806/6486af1c-2e14-420b-8419-bf5e01fe49a5
114
-
115
-
 
16
 
17
  <img width="993" alt="Screenshot 2023-11-19 at 11 53 52 PM" src="https://github.com/dubverse-ai/MahaTTS/assets/32906806/7429d3b6-3f19-4bd8-9005-ff9e16a698f8">
18
 
19
+ ### Architecture
20
+ | Model (Smolie) | Parameters | Model Type | Output |
21
+ |:-------------------------:|:----------:|------------|:-----------------:|
22
+ | Text to Semantic (M1) | 69 M | Causal LM | 10,001 Tokens |
23
+ | Semantic to MelSpec(M2) | 108 M | Diffusion | 2x 80x Melspec |
24
+ | Hifi Gan Vocoder | 13 M | GAN | Audio Waveform |
25
 
26
  ## Features
27
  1. Multilinguality
 
44
  - [ ] Smolie - indic
45
  - [ ] Optimizations for inference
46