File size: 5,730 Bytes
e416924
 
 
 
15ae496
792840b
15ae496
5f11ceb
e416924
20935f9
 
257fd44
 
 
20935f9
257fd44
 
 
 
60d7ad0
257fd44
20935f9
257fd44
 
 
 
 
20935f9
60d7ad0
20935f9
 
 
 
 
257fd44
 
 
20935f9
60d7ad0
20935f9
 
257fd44
 
15ae496
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
---
Model Type: Text to Speech
Supported Languages: Assamese, Bengali, Bodo, Gujarati, Hindi, Kannada, Malayalam, Manipuri, Marathi, Odia, Punjabi, Rajasthani, Tamil, Telugu, Urdu
---

<img src="https://api.visitorbadge.io/api/visitors?path=https://huggingface.co/k-m-irfan/Fastspeech2_HS_Flask_API&label=VISITORS&countColor=%234285f4" align="right"></br></br>

***Demo: [IITM-TTS Demo](https://iitm-tts.onrender.com) | This may take approximately 30 seconds to load the first time and will go idle after 15 minutes of inactivity.***

# Fastspeech2_HS_Flask_API

This repository contains the Flask API implementation of the Text to Speech Model developed by the Speech Lab at IIT Madras. 
For a comprehensive understanding of the models and inference details, please consult the original repository 
[Fastspeech2_HS](https://github.com/smtiitm/Fastspeech2_HS).

### Table of Contents
- [Setup](#setup)
- [Installation](#installation)
- [Run Flask server](#run-flask-server)
- [API](#api)
- [Citation for the original repo](#citation-for-the-original-repo)

### Setup
Some of the large files in this repo are uploaded using git lfs. Install latest git LFS by following the given commands:

Some of the large files in this repository have been uploaded using Git-LFS. 
To ensure seamless handling of these files, please install Git-LFS by executing the provided commands:

```bash
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.python.sh | bash
sudo apt-get install git-lfs
git lfs install
```

The entire repository, including the models, has been uploaded to Hugging Face 
"[Fastspeech2_HS_Flask_API](https://huggingface.co/k-m-irfan/Fastspeech2_HS_Flask_API)" due to size restrictions on GitHub for Git LFS. 
To clone the repository from Hugging Face, please use the following command:

```bash
git clone https://huggingface.co/k-m-irfan/Fastspeech2_HS_Flask_API
```

Alternatively, you can download the models from the original repository [Fastspeech2_HS](https://github.com/smtiitm/Fastspeech2_HS) 
and organize the folder structure as specified below. Skip this step if already cloned the repository from Hugging Face.

```bash
models
├── hindi
│   ├── female
│   └── male
├── tamil
│   ├── female
│   └── male
.
.
.
└── marathi
    ├── female
    └── male
```

### Installation:

Create a virtual environment and activate it:
```bash
python3 -m venv tts-hs-hifigan
source tts-hs-hifigan/bin/activate
```

Install the required dependencies by running:
```bash
pip install -r requirements.txt
```

### Run Flask server:
Ensure the server application is running correctly before proceeding. Use the following commands and check for any errors:
```bash
python3 flask_app.py
# OR
gunicorn -w 2 -b 0.0.0.0:5000 flask_app:app --timeout 600
```

If the application is running without any issues, proceed to start the server using the following command:
```bash
bash start.sh
```

### API
```python
"""
This is a sample API code to send a text to the server and recieve speech
for the given text.

Supported languages: 

Assamese, Bengali, Bodo, Gujarati, Hindi, Kannada, Malayalam, Manipuri
Marathi, Odia, Punjabi, Rajasthani, Tamil, Telugu, Urdu

"""
import requests
import json
import base64

# endpoint
url = "http://localhost:5000/tts"

lang = 'hindi'
gender = 'female'
text = "सुप्रभात, आप कैसे हैं?" # hindi
# text = "സുപ്രഭാതം, സുഖമാ?" # malayalam
# text = "সুপ্ৰভাত, তুমি কেনে?" # manipuri
# text = "सुप्रभात, तुम्ही कसे आहात?" # marathi
# text = "ಶುಭೋದಯ, ನೀವು ಹೇಗಿದ್ದೀರಿ?" # kannada
# text = "बसु म्विथ्बो, बरि दिबाबो?" # bodo male yet to be added <---
# text = "Good morning, how are you?" # english
# text = "সুপ্ৰভাত, আপুনি কেমন আছে?" # assamese
# text = "காலை வணக்கம், நீங்கள் எப்படி இருக்கின்றீர்கள்?" # tamil
# text = "ସୁପ୍ରଭାତ, ଆପଣ କେମିତି ଅଛନ୍ତି?"
# text = "सुप्रभात, आप कैसे छो?" # rajasthani
# text = "శుభోదయం, మీరు ఎలా ఉన్నారు?" # telugu
# text = "সুপ্রভাত, আপনি কেমন আছেন?" # bengali
# text = "સુપ્રભાત, તમે કેમ છો?" # gujarati

payload = json.dumps(
    {
    "input": text,
    "gender": gender,
    "lang": lang,
    "alpha": 1 # to control speed
    })

headers = {'Content-Type': 'application/json'}
response = requests.request("POST", url, headers=headers, data=payload).json()

# save the received encoded audio
audio = response['audio']
file_name = "tts.wav"
wav_file = open(file_name,'wb')
decode_string = base64.b64decode(audio)
wav_file.write(decode_string)
wav_file.close()
```

### Citation for the original repo
If you use this Fastspeech2 Model in your research or work, please consider citing:

“
COPYRIGHT
2023, Speech Technology Consortium,
Bhashini, MeiTY and by Hema A Murthy & S Umesh,
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
and
ELECTRICAL ENGINEERING,
IIT MADRAS. ALL RIGHTS RESERVED "


Shield: [![CC BY 4.0][cc-by-shield]][cc-by]

This work is licensed under a
[Creative Commons Attribution 4.0 International License][cc-by].

[![CC BY 4.0][cc-by-image]][cc-by]

[cc-by]: http://creativecommons.org/licenses/by/4.0/
[cc-by-image]: https://i.creativecommons.org/l/by/4.0/88x31.png
[cc-by-shield]: https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg