Vishwas1 commited on
Commit
8508174
Β·
verified Β·
1 Parent(s): 2537fdf

Upload 3 files

Browse files
Files changed (3) hide show
  1. README.md +88 -13
  2. app.py +177 -0
  3. requirements.txt +8 -0
README.md CHANGED
@@ -1,13 +1,88 @@
1
- ---
2
- title: KittenTTSDemo
3
- emoji: πŸ“Š
4
- colorFrom: blue
5
- colorTo: pink
6
- sdk: gradio
7
- sdk_version: 5.41.1
8
- app_file: app.py
9
- pinned: false
10
- license: mit
11
- ---
12
-
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🎀 KittenTTS - High Quality Text-to-Speech
2
+
3
+ A Hugging Face Space showcasing the KittenTTS model for high-quality text-to-speech generation.
4
+
5
+ ## πŸš€ Features
6
+
7
+ - **8 Different Voices**: 4 male and 4 female voices to choose from
8
+ - **High Quality Audio**: 24kHz sample rate for crisp, clear speech
9
+ - **GPU-Free**: Works without requiring a GPU
10
+ - **Easy-to-Use Interface**: Simple and intuitive Gradio web interface
11
+ - **Real-time Generation**: Fast speech synthesis with progress tracking
12
+
13
+ ## 🎡 Available Voices
14
+
15
+ | Voice ID | Gender | Description |
16
+ |----------|--------|-------------|
17
+ | `expr-voice-2-m` | Male | Male voice variant 2 |
18
+ | `expr-voice-2-f` | Female | Female voice variant 2 |
19
+ | `expr-voice-3-m` | Male | Male voice variant 3 |
20
+ | `expr-voice-3-f` | Female | Female voice variant 3 |
21
+ | `expr-voice-4-m` | Male | Male voice variant 4 |
22
+ | `expr-voice-4-f` | Female | Female voice variant 4 |
23
+ | `expr-voice-5-m` | Male | Male voice variant 5 |
24
+ | `expr-voice-5-f` | Female | Female voice variant 5 |
25
+
26
+ ## πŸ› οΈ Usage
27
+
28
+ 1. **Enter Text**: Type or paste your text in the input box
29
+ 2. **Select Voice**: Choose from the dropdown menu of available voices
30
+ 3. **Generate**: Click the "Generate Speech" button or press Enter
31
+ 4. **Download**: Play the generated audio or download it
32
+
33
+ ## πŸ’» Technical Details
34
+
35
+ - **Model**: [KittenML/kitten-tts-nano-0.1](https://huggingface.co/KittenML/kitten-tts-nano-0.1)
36
+ - **Sample Rate**: 24kHz
37
+ - **Framework**: KittenTTS
38
+ - **Interface**: Gradio
39
+ - **Audio Format**: WAV (24kHz, mono)
40
+
41
+ ## πŸ”§ Local Development
42
+
43
+ To run this locally:
44
+
45
+ ```bash
46
+ # Clone the repository
47
+ git clone <your-repo-url>
48
+ cd <your-repo-name>
49
+
50
+ # Install dependencies
51
+ pip install -r requirements.txt
52
+
53
+ # Run the application
54
+ python app.py
55
+ ```
56
+
57
+ ## πŸ“¦ Dependencies
58
+
59
+ - `gradio>=4.0.0` - Web interface
60
+ - `kittentts` - TTS framework
61
+ - `soundfile` - Audio file handling
62
+ - `numpy` - Numerical operations
63
+ - `torch` - PyTorch backend
64
+ - `torchaudio` - Audio processing
65
+ - `transformers` - Hugging Face transformers
66
+ - `accelerate` - Model acceleration
67
+
68
+ ## 🀝 Contributing
69
+
70
+ Feel free to contribute by:
71
+ - Reporting bugs
72
+ - Suggesting new features
73
+ - Improving the UI
74
+ - Adding more voice options
75
+
76
+ ## πŸ“„ License
77
+
78
+ This project uses the KittenTTS model. Please refer to the original model's license for usage terms.
79
+
80
+ ## πŸ™ Acknowledgments
81
+
82
+ - [KittenML](https://huggingface.co/KittenML) for the TTS model
83
+ - [Hugging Face](https://huggingface.co) for the Spaces platform
84
+ - [Gradio](https://gradio.app) for the web interface framework
85
+
86
+ ---
87
+
88
+ **Note**: This is a demonstration of the KittenTTS model. For production use, please ensure compliance with the model's license and terms of use.
app.py ADDED
@@ -0,0 +1,177 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import soundfile as sf
3
+ import numpy as np
4
+ from kittentts import KittenTTS
5
+ import os
6
+
7
+ # Initialize the model
8
+ model = KittenTTS("KittenML/kitten-tts-nano-0.1")
9
+
10
+ # Available voices
11
+ AVAILABLE_VOICES = [
12
+ 'expr-voice-2-m', 'expr-voice-2-f', 'expr-voice-3-m', 'expr-voice-3-f',
13
+ 'expr-voice-4-m', 'expr-voice-4-f', 'expr-voice-5-m', 'expr-voice-5-f'
14
+ ]
15
+
16
+ def generate_speech(text, voice, progress=gr.Progress()):
17
+ """
18
+ Generate speech from text using KittenTTS
19
+ """
20
+ if not text.strip():
21
+ return None, "Please enter some text to generate speech."
22
+
23
+ try:
24
+ progress(0.3, desc="Loading model...")
25
+
26
+ # Generate audio
27
+ progress(0.6, desc="Generating speech...")
28
+ audio = model.generate(text, voice=voice)
29
+
30
+ progress(0.9, desc="Processing audio...")
31
+
32
+ # Convert to the format expected by Gradio
33
+ # Ensure audio is in the correct format (float32, mono)
34
+ if len(audio.shape) > 1:
35
+ audio = audio.mean(axis=1) # Convert stereo to mono if needed
36
+
37
+ # Normalize audio
38
+ audio = audio / np.max(np.abs(audio)) if np.max(np.abs(audio)) > 0 else audio
39
+
40
+ progress(1.0, desc="Complete!")
41
+
42
+ return (24000, audio), f"βœ… Successfully generated speech with voice: {voice}"
43
+
44
+ except Exception as e:
45
+ return None, f"❌ Error generating speech: {str(e)}"
46
+
47
+ def create_demo():
48
+ """
49
+ Create the Gradio demo interface
50
+ """
51
+
52
+ # Custom CSS for better styling
53
+ css = """
54
+ .gradio-container {
55
+ max-width: 800px !important;
56
+ margin: auto !important;
57
+ }
58
+ .main-header {
59
+ text-align: center;
60
+ margin-bottom: 2rem;
61
+ }
62
+ .voice-selector {
63
+ margin: 1rem 0;
64
+ }
65
+ .output-audio {
66
+ margin-top: 1rem;
67
+ }
68
+ """
69
+
70
+ with gr.Blocks(css=css, title="KittenTTS - High Quality Text-to-Speech") as demo:
71
+
72
+ # Header
73
+ gr.HTML("""
74
+ <div class="main-header">
75
+ <h1>🎀 KittenTTS</h1>
76
+ <p><em>High Quality Text-to-Speech Generation</em></p>
77
+ <p>Generate natural-sounding speech from text using the KittenTTS model</p>
78
+ </div>
79
+ """)
80
+
81
+ with gr.Row():
82
+ with gr.Column(scale=2):
83
+ # Text input
84
+ text_input = gr.Textbox(
85
+ label="Enter your text",
86
+ placeholder="Type or paste your text here...",
87
+ lines=4,
88
+ max_lines=10
89
+ )
90
+
91
+ # Voice selection
92
+ voice_dropdown = gr.Dropdown(
93
+ choices=AVAILABLE_VOICES,
94
+ value=AVAILABLE_VOICES[1], # Default to female voice
95
+ label="Select Voice",
96
+ info="Choose from 8 different voices (4 male, 4 female)"
97
+ )
98
+
99
+ # Generate button
100
+ generate_btn = gr.Button(
101
+ "🎡 Generate Speech",
102
+ variant="primary",
103
+ size="lg"
104
+ )
105
+
106
+ with gr.Column(scale=1):
107
+ # Voice info
108
+ gr.HTML("""
109
+ <div style="background: #f0f0f0; padding: 1rem; border-radius: 8px;">
110
+ <h3>Available Voices:</h3>
111
+ <ul>
112
+ <li><strong>Male voices:</strong> expr-voice-2-m, expr-voice-3-m, expr-voice-4-m, expr-voice-5-m</li>
113
+ <li><strong>Female voices:</strong> expr-voice-2-f, expr-voice-3-f, expr-voice-4-f, expr-voice-5-f</li>
114
+ </ul>
115
+ </div>
116
+ """)
117
+
118
+ # Output section
119
+ with gr.Row():
120
+ with gr.Column():
121
+ # Audio output
122
+ audio_output = gr.Audio(
123
+ label="Generated Audio",
124
+ type="numpy",
125
+ sample_rate=24000
126
+ )
127
+
128
+ # Status message
129
+ status_output = gr.Textbox(
130
+ label="Status",
131
+ interactive=False
132
+ )
133
+
134
+ # Example texts
135
+ gr.Examples(
136
+ examples=[
137
+ ["Hello! This is a demonstration of the KittenTTS model.", "expr-voice-2-f"],
138
+ ["The quick brown fox jumps over the lazy dog.", "expr-voice-2-m"],
139
+ ["Welcome to our high-quality text-to-speech system.", "expr-voice-3-f"],
140
+ ["This model works without requiring a GPU.", "expr-voice-3-m"],
141
+ ],
142
+ inputs=[text_input, voice_dropdown],
143
+ label="Try these examples:"
144
+ )
145
+
146
+ # Footer
147
+ gr.HTML("""
148
+ <div style="text-align: center; margin-top: 2rem; padding: 1rem; background: #f9f9f9; border-radius: 8px;">
149
+ <p><strong>KittenTTS</strong> - Powered by <a href="https://huggingface.co/KittenML/kitten-tts-nano-0.1" target="_blank">KittenML/kitten-tts-nano-0.1</a></p>
150
+ <p>Model: KittenTTS Nano v0.1 | Sample Rate: 24kHz</p>
151
+ </div>
152
+ """)
153
+
154
+ # Connect the generate button
155
+ generate_btn.click(
156
+ fn=generate_speech,
157
+ inputs=[text_input, voice_dropdown],
158
+ outputs=[audio_output, status_output]
159
+ )
160
+
161
+ # Auto-generate when text is entered and Enter is pressed
162
+ text_input.submit(
163
+ fn=generate_speech,
164
+ inputs=[text_input, voice_dropdown],
165
+ outputs=[audio_output, status_output]
166
+ )
167
+
168
+ return demo
169
+
170
+ # Create and launch the demo
171
+ if __name__ == "__main__":
172
+ demo = create_demo()
173
+ demo.launch(
174
+ server_name="0.0.0.0",
175
+ server_port=7860,
176
+ share=False
177
+ )
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ gradio>=4.0.0
2
+ kittentts
3
+ soundfile
4
+ numpy
5
+ torch
6
+ torchaudio
7
+ transformers
8
+ accelerate