File size: 5,730 Bytes
e416924 15ae496 792840b 15ae496 5f11ceb e416924 20935f9 257fd44 20935f9 257fd44 60d7ad0 257fd44 20935f9 257fd44 20935f9 60d7ad0 20935f9 257fd44 20935f9 60d7ad0 20935f9 257fd44 15ae496 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 |
---
Model Type: Text to Speech
Supported Languages: Assamese, Bengali, Bodo, Gujarati, Hindi, Kannada, Malayalam, Manipuri, Marathi, Odia, Punjabi, Rajasthani, Tamil, Telugu, Urdu
---
<img src="https://api.visitorbadge.io/api/visitors?path=https://huggingface.co/k-m-irfan/Fastspeech2_HS_Flask_API&label=VISITORS&countColor=%234285f4" align="right"></br></br>
***Demo: [IITM-TTS Demo](https://iitm-tts.onrender.com) | This may take approximately 30 seconds to load the first time and will go idle after 15 minutes of inactivity.***
# Fastspeech2_HS_Flask_API
This repository contains the Flask API implementation of the Text to Speech Model developed by the Speech Lab at IIT Madras.
For a comprehensive understanding of the models and inference details, please consult the original repository
[Fastspeech2_HS](https://github.com/smtiitm/Fastspeech2_HS).
### Table of Contents
- [Setup](#setup)
- [Installation](#installation)
- [Run Flask server](#run-flask-server)
- [API](#api)
- [Citation for the original repo](#citation-for-the-original-repo)
### Setup
Some of the large files in this repo are uploaded using git lfs. Install latest git LFS by following the given commands:
Some of the large files in this repository have been uploaded using Git-LFS.
To ensure seamless handling of these files, please install Git-LFS by executing the provided commands:
```bash
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.python.sh | bash
sudo apt-get install git-lfs
git lfs install
```
The entire repository, including the models, has been uploaded to Hugging Face
"[Fastspeech2_HS_Flask_API](https://huggingface.co/k-m-irfan/Fastspeech2_HS_Flask_API)" due to size restrictions on GitHub for Git LFS.
To clone the repository from Hugging Face, please use the following command:
```bash
git clone https://huggingface.co/k-m-irfan/Fastspeech2_HS_Flask_API
```
Alternatively, you can download the models from the original repository [Fastspeech2_HS](https://github.com/smtiitm/Fastspeech2_HS)
and organize the folder structure as specified below. Skip this step if already cloned the repository from Hugging Face.
```bash
models
├── hindi
│ ├── female
│ └── male
├── tamil
│ ├── female
│ └── male
.
.
.
└── marathi
├── female
└── male
```
### Installation:
Create a virtual environment and activate it:
```bash
python3 -m venv tts-hs-hifigan
source tts-hs-hifigan/bin/activate
```
Install the required dependencies by running:
```bash
pip install -r requirements.txt
```
### Run Flask server:
Ensure the server application is running correctly before proceeding. Use the following commands and check for any errors:
```bash
python3 flask_app.py
# OR
gunicorn -w 2 -b 0.0.0.0:5000 flask_app:app --timeout 600
```
If the application is running without any issues, proceed to start the server using the following command:
```bash
bash start.sh
```
### API
```python
"""
This is a sample API code to send a text to the server and recieve speech
for the given text.
Supported languages:
Assamese, Bengali, Bodo, Gujarati, Hindi, Kannada, Malayalam, Manipuri
Marathi, Odia, Punjabi, Rajasthani, Tamil, Telugu, Urdu
"""
import requests
import json
import base64
# endpoint
url = "http://localhost:5000/tts"
lang = 'hindi'
gender = 'female'
text = "सुप्रभात, आप कैसे हैं?" # hindi
# text = "സുപ്രഭാതം, സുഖമാ?" # malayalam
# text = "সুপ্ৰভাত, তুমি কেনে?" # manipuri
# text = "सुप्रभात, तुम्ही कसे आहात?" # marathi
# text = "ಶುಭೋದಯ, ನೀವು ಹೇಗಿದ್ದೀರಿ?" # kannada
# text = "बसु म्विथ्बो, बरि दिबाबो?" # bodo male yet to be added <---
# text = "Good morning, how are you?" # english
# text = "সুপ্ৰভাত, আপুনি কেমন আছে?" # assamese
# text = "காலை வணக்கம், நீங்கள் எப்படி இருக்கின்றீர்கள்?" # tamil
# text = "ସୁପ୍ରଭାତ, ଆପଣ କେମିତି ଅଛନ୍ତି?"
# text = "सुप्रभात, आप कैसे छो?" # rajasthani
# text = "శుభోదయం, మీరు ఎలా ఉన్నారు?" # telugu
# text = "সুপ্রভাত, আপনি কেমন আছেন?" # bengali
# text = "સુપ્રભાત, તમે કેમ છો?" # gujarati
payload = json.dumps(
{
"input": text,
"gender": gender,
"lang": lang,
"alpha": 1 # to control speed
})
headers = {'Content-Type': 'application/json'}
response = requests.request("POST", url, headers=headers, data=payload).json()
# save the received encoded audio
audio = response['audio']
file_name = "tts.wav"
wav_file = open(file_name,'wb')
decode_string = base64.b64decode(audio)
wav_file.write(decode_string)
wav_file.close()
```
### Citation for the original repo
If you use this Fastspeech2 Model in your research or work, please consider citing:
“
COPYRIGHT
2023, Speech Technology Consortium,
Bhashini, MeiTY and by Hema A Murthy & S Umesh,
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
and
ELECTRICAL ENGINEERING,
IIT MADRAS. ALL RIGHTS RESERVED "
Shield: [![CC BY 4.0][cc-by-shield]][cc-by]
This work is licensed under a
[Creative Commons Attribution 4.0 International License][cc-by].
[![CC BY 4.0][cc-by-image]][cc-by]
[cc-by]: http://creativecommons.org/licenses/by/4.0/
[cc-by-image]: https://i.creativecommons.org/l/by/4.0/88x31.png
[cc-by-shield]: https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg
|