File size: 1,939 Bytes
fa01c89
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63d1d27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ac2aafd
63d1d27
 
 
 
 
 
 
 
 
 
 
 
 
 
fa01c89
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
license: mit
library_name: safetensors
tags:
- text-to-speech
- tts
- hindi
- speech-synthesis
- code
datasets:
- SPRINGLab/IndicVoices-R_Hindi
language:
- hi
model_type: F5-TTS
base_model:
- SWivid/F5-TTS
---


# Hindi TTS (Text-to-Speech, 24kHz)

## Overview  
Hindi TTS is a high-quality Text-to-Speech model developed using the F5 TTS architecture. Built by FuturixAI and Quantum Works, this model enables natural-sounding Hindi speech synthesis and is distributed under the MIT license. It is intended for both research and commercial applications.

## Key Features  
- **Language:** Hindi  
- **Sampling Rate:** 24 kHz  

## Training Data  
The model was trained on the **IndicVoices-R_Hindi** dataset provided by IIT Madras.  
- Dataset Link: [https://huggingface.co/datasets/SPRINGLab/IndicVoices-R_Hindi](https://huggingface.co/datasets/SPRINGLab/IndicVoices-R_Hindi)

## Usage Instructions  

### Prerequisites  
Ensure you have installed the necessary dependencies for the `f5-tts_infer-cli`. Refer to the GitHub repository for installation instructions:  
[https://github.com/rumourscape/F5-TTS](https://github.com/rumourscape/F5-TTS)

### Example Usage  

```bash
f5-tts_infer-cli \
--model "Futurix-AI/Hindi-TTS" \
--ref_audio "ref_audio.wav" \
--ref_text "यह संदर्भ ऑडियो का सामग्री, उपशीर्षक या लिप्यंतरण है।" \
--gen_text "यह एक उदाहरण है जो मॉडल से बोलने के लिए उत्पन्न किया गया है।"
```

#### Parameters:
- **`--model`**: Replace "hindi_tts_checkpoint.pth" with the actual checkpoint file name.
- **`--ref_audio`**: Path to the reference audio file (e.g., "ref_audio.wav").
- **`--ref_text`**: Hindi text corresponding to the reference audio.
- **`--gen_text`**: Hindi text for the TTS model to generate speech.


---
license: mit
---