File size: 5,730 Bytes

---
Model Type: Text to Speech
Supported Languages: Assamese, Bengali, Bodo, Gujarati, Hindi, Kannada, Malayalam, Manipuri, Marathi, Odia, Punjabi, Rajasthani, Tamil, Telugu, Urdu
---

<img src="https://api.visitorbadge.io/api/visitors?path=https://huggingface.co/k-m-irfan/Fastspeech2_HS_Flask_API&label=VISITORS&countColor=%234285f4" align="right"></br></br>

***Demo: [IITM-TTS Demo](https://iitm-tts.onrender.com) | This may take approximately 30 seconds to load the first time and will go idle after 15 minutes of inactivity.***

# Fastspeech2_HS_Flask_API

This repository contains the Flask API implementation of the Text to Speech Model developed by the Speech Lab at IIT Madras. 
For a comprehensive understanding of the models and inference details, please consult the original repository 
[Fastspeech2_HS](https://github.com/smtiitm/Fastspeech2_HS).

### Table of Contents
- [Setup](#setup)
- [Installation](#installation)
- [Run Flask server](#run-flask-server)
- [API](#api)
- [Citation for the original repo](#citation-for-the-original-repo)

### Setup
Some of the large files in this repo are uploaded using git lfs. Install latest git LFS by following the given commands:

Some of the large files in this repository have been uploaded using Git-LFS. 
To ensure seamless handling of these files, please install Git-LFS by executing the provided commands:

```bash
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.python.sh | bash
sudo apt-get install git-lfs
git lfs install
```

The entire repository, including the models, has been uploaded to Hugging Face 
"[Fastspeech2_HS_Flask_API](https://huggingface.co/k-m-irfan/Fastspeech2_HS_Flask_API)" due to size restrictions on GitHub for Git LFS. 
To clone the repository from Hugging Face, please use the following command:

```bash
git clone https://huggingface.co/k-m-irfan/Fastspeech2_HS_Flask_API
```

Alternatively, you can download the models from the original repository [Fastspeech2_HS](https://github.com/smtiitm/Fastspeech2_HS) 
and organize the folder structure as specified below. Skip this step if already cloned the repository from Hugging Face.

```bash
models
├── hindi
│   ├── female
│   └── male
├── tamil
│   ├── female
│   └── male
.
.
.
└── marathi
    ├── female
    └── male
```

### Installation:

Create a virtual environment and activate it:
```bash
python3 -m venv tts-hs-hifigan
source tts-hs-hifigan/bin/activate
```

Install the required dependencies by running:
```bash
pip install -r requirements.txt
```

### Run Flask server:
Ensure the server application is running correctly before proceeding. Use the following commands and check for any errors:
```bash
python3 flask_app.py
# OR
gunicorn -w 2 -b 0.0.0.0:5000 flask_app:app --timeout 600
```

If the application is running without any issues, proceed to start the server using the following command:
```bash
bash start.sh
```

### API
```python
"""
This is a sample API code to send a text to the server and recieve speech
for the given text.

Supported languages: 

Assamese, Bengali, Bodo, Gujarati, Hindi, Kannada, Malayalam, Manipuri
Marathi, Odia, Punjabi, Rajasthani, Tamil, Telugu, Urdu

"""
import requests
import json
import base64

# endpoint
url = "http://localhost:5000/tts"

lang = 'hindi'
gender = 'female'
text = "सुप्रभात, आप कैसे हैं?" # hindi
# text = "സുപ്രഭാതം, സുഖമാ?" # malayalam
# text = "সুপ্ৰভাত, তুমি কেনে?" # manipuri
# text = "सुप्रभात, तुम्ही कसे आहात?" # marathi
# text = "ಶುಭೋದಯ, ನೀವು ಹೇಗಿದ್ದೀರಿ?" # kannada
# text = "बसु म्विथ्बो, बरि दिबाबो?" # bodo male yet to be added <---
# text = "Good morning, how are you?" # english
# text = "সুপ্ৰভাত, আপুনি কেমন আছে?" # assamese
# text = "காலை வணக்கம், நீங்கள் எப்படி இருக்கின்றீர்கள்?" # tamil
# text = "ସୁପ୍ରଭାତ, ଆପଣ କେମିତି ଅଛନ୍ତି?"
# text = "सुप्रभात, आप कैसे छो?" # rajasthani
# text = "శుభోదయం, మీరు ఎలా ఉన్నారు?" # telugu
# text = "সুপ্রভাত, আপনি কেমন আছেন?" # bengali
# text = "સુપ્રભાત, તમે કેમ છો?" # gujarati

payload = json.dumps(
    {
    "input": text,
    "gender": gender,
    "lang": lang,
    "alpha": 1 # to control speed
    })

headers = {'Content-Type': 'application/json'}
response = requests.request("POST", url, headers=headers, data=payload).json()

# save the received encoded audio
audio = response['audio']
file_name = "tts.wav"
wav_file = open(file_name,'wb')
decode_string = base64.b64decode(audio)
wav_file.write(decode_string)
wav_file.close()
```

### Citation for the original repo
If you use this Fastspeech2 Model in your research or work, please consider citing:

“
COPYRIGHT
2023, Speech Technology Consortium,
Bhashini, MeiTY and by Hema A Murthy & S Umesh,
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
and
ELECTRICAL ENGINEERING,
IIT MADRAS. ALL RIGHTS RESERVED "


Shield: [![CC BY 4.0][cc-by-shield]][cc-by]

This work is licensed under a
[Creative Commons Attribution 4.0 International License][cc-by].

[![CC BY 4.0][cc-by-image]][cc-by]

[cc-by]: http://creativecommons.org/licenses/by/4.0/
[cc-by-image]: https://i.creativecommons.org/l/by/4.0/88x31.png
[cc-by-shield]: https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg