metadata

library_name: transformers
tags:
  - text-to-speech
  - annotation
license: apache-2.0
language:
  - en
  - as
  - bn
  - gu
  - hi
  - kn
  - ks
  - or
  - ml
  - mr
  - ne
  - pa
  - sa
  - sd
  - ta
  - te
  - ur
  - om
pipeline_tag: text-to-speech
inference: false
base_model:
  - ai4bharat/indic-parler-tts

HelpingAI-TTS-v1 🎤🔥

Yo, what's good! Welcome to HelpingAI-TTS-v1, your go-to for next-level Text-to-Speech (TTS) that's all about personalization, vibes, and clarity. Whether you want your text to sound cheerful, emotional, or just like you're chatting with a friend, this model's got you covered. 💯

🚀 What’s HelpingAI-TTS-v1?

HelpingAI-TTS-v1 is a beast when it comes to generating high-quality, customizable speech. It doesn’t just spit out generic text; it feels what you're saying and brings it to life with style. Add a description to your speech, like how fast or slow it should be, if it’s cheerful or serious, and BOOM — you got yourself the perfect audio output. 🎧

🛠️ How It Works: A Quick Rundown 🔥

Transcript: The text you want to speak. Keep it casual, formal, or whatever suits your vibe.
Caption: Describes how you want the speech to sound. Want a fast-paced, hype vibe or a calm, soothing tone? Just say it. 🔥

💡 Features You’ll Love:

Expressive Speech: This isn’t just any TTS. You can describe the tone, speed, and vibe you want. Whether it's a peppy "Hey!" or a chill "What's up?", this model’s got your back.
Top-Notch Quality: Super clean audio. No static. Just pure, high-quality sound that makes your words pop.
Customizable Like Never Before: Play with emotions, tone, and even accents. It’s all about making it personal. 🌍

🔧 Get Started: Installation 🔥

Ready to vibe? Here’s how you set up HelpingAI-TTS-v1 in seconds:

pip install git+https://github.com/huggingface/parler-tts.git

🖥️ Usage: Let's Make Some Magic 🎤

Here’s the code that gets the job done. Super simple to use, just plug in your text and describe how you want it to sound. It’s like setting the mood for a movie.

import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf

# Choose your device (GPU or CPU)
device = "cuda:0" if torch.cuda.is_available() else "cpu"

# Load the model and tokenizers
model = ParlerTTSForConditionalGeneration.from_pretrained("HelpingAI/HelpingAI-TTS-v1").to(device)
tokenizer = AutoTokenizer.from_pretrained("HelpingAI/HelpingAI-TTS-v1")
description_tokenizer = AutoTokenizer.from_pretrained(model.config.text_encoder._name_or_path)

# Customize your inputs: text + description
prompt = "Hey, what's up? How’s it going?"
description = "A friendly, upbeat, and casual tone with a moderate speed. Speaker sounds confident and relaxed."

# Tokenize the inputs
input_ids = description_tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

# Generate the audio
generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()

# Save the audio to a file
sf.write("output.wav", audio_arr, model.config.sampling_rate)

This will create a super clean .wav file with the speech you asked for. 🔥

🌍 Language Support: Speak Your Language

No matter where you're from, HelpingAI-TTS-v1 has you covered. Officially supporting 20+ languages and unofficial support for a few more. That’s global vibes right there. 🌏

Assamese
Bengali
Bodo
Dogri
Kannada
Malayalam
Marathi
Sanskrit
Nepali
English
Telugu
Hindi
Gujarati
Konkani
Maithili
Manipuri
Odia
Santali
Sindhi
Tamil
Urdu
Chhattisgarhi
Kashmiri
Punjabi

Powered by HelpingAI, where we blend emotional intelligence with tech. 🌟