library_name: transformers
tags:
- text-to-speech
- annotation
license: apache-2.0
language:
- en
- as
- bn
- gu
- hi
- kn
- ks
- or
- ml
- mr
- ne
- pa
- sa
- sd
- ta
- te
- ur
- om
pipeline_tag: text-to-speech
inference: false
base_model:
- ai4bharat/indic-parler-tts
HelpingAI-TTS-v1 ๐ค๐ฅ
Yo, what's good! Welcome to HelpingAI-TTS-v1, your go-to for next-level Text-to-Speech (TTS) that's all about personalization, vibes, and clarity. Whether you want your text to sound cheerful, emotional, or just like you're chatting with a friend, this model's got you covered. ๐ฏ
๐ Whatโs HelpingAI-TTS-v1?
HelpingAI-TTS-v1 is a beast when it comes to generating high-quality, customizable speech. It doesnโt just spit out generic text; it feels what you're saying and brings it to life with style. Add a description to your speech, like how fast or slow it should be, if itโs cheerful or serious, and BOOM โ you got yourself the perfect audio output. ๐ง
๐ ๏ธ How It Works: A Quick Rundown ๐ฅ
- Transcript: The text you want to speak. Keep it casual, formal, or whatever suits your vibe.
- Caption: Describes how you want the speech to sound. Want a fast-paced, hype vibe or a calm, soothing tone? Just say it. ๐ฅ
๐ก Features Youโll Love:
- Expressive Speech: This isnโt just any TTS. You can describe the tone, speed, and vibe you want. Whether it's a peppy "Hey!" or a chill "What's up?", this modelโs got your back.
- Top-Notch Quality: Super clean audio. No static. Just pure, high-quality sound that makes your words pop.
- Customizable Like Never Before: Play with emotions, tone, and even accents. Itโs all about making it personal. ๐
๐ง Get Started: Installation ๐ฅ
Ready to vibe? Hereโs how you set up HelpingAI-TTS-v1 in seconds:
pip install git+https://github.com/huggingface/parler-tts.git
๐ฅ๏ธ Usage: Let's Make Some Magic ๐ค
Hereโs the code that gets the job done. Super simple to use, just plug in your text and describe how you want it to sound. Itโs like setting the mood for a movie.
import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf
# Choose your device (GPU or CPU)
device = "cuda:0" if torch.cuda.is_available() else "cpu"
# Load the model and tokenizers
model = ParlerTTSForConditionalGeneration.from_pretrained("HelpingAI/HelpingAI-TTS-v1").to(device)
tokenizer = AutoTokenizer.from_pretrained("HelpingAI/HelpingAI-TTS-v1")
description_tokenizer = AutoTokenizer.from_pretrained(model.config.text_encoder._name_or_path)
# Customize your inputs: text + description
prompt = "Hey, what's up? Howโs it going?"
description = "A friendly, upbeat, and casual tone with a moderate speed. Speaker sounds confident and relaxed."
# Tokenize the inputs
input_ids = description_tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
# Generate the audio
generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
# Save the audio to a file
sf.write("output.wav", audio_arr, model.config.sampling_rate)
This will create a super clean .wav
file with the speech you asked for. ๐ฅ
๐ Language Support: Speak Your Language
No matter where you're from, HelpingAI-TTS-v1 has you covered. Officially supporting 20+ languages and unofficial support for a few more. Thatโs global vibes right there. ๐
- Assamese
- Bengali
- Bodo
- Dogri
- Kannada
- Malayalam
- Marathi
- Sanskrit
- Nepali
- English
- Telugu
- Hindi
- Gujarati
- Konkani
- Maithili
- Manipuri
- Odia
- Santali
- Sindhi
- Tamil
- Urdu
- Chhattisgarhi
- Kashmiri
- Punjabi
Powered by HelpingAI, where we blend emotional intelligence with tech. ๐