HelpingAI-TTS-v1 / README.md
Abhaykoul's picture
Create README.md
8e06ba1 verified
metadata
library_name: transformers
tags:
  - text-to-speech
  - annotation
license: apache-2.0
language:
  - en
  - as
  - bn
  - gu
  - hi
  - kn
  - ks
  - or
  - ml
  - mr
  - ne
  - pa
  - sa
  - sd
  - ta
  - te
  - ur
  - om
pipeline_tag: text-to-speech
inference: false
base_model:
  - ai4bharat/indic-parler-tts

HelpingAI-TTS-v1 ๐ŸŽค๐Ÿ”ฅ

Yo, what's good! Welcome to HelpingAI-TTS-v1, your go-to for next-level Text-to-Speech (TTS) that's all about personalization, vibes, and clarity. Whether you want your text to sound cheerful, emotional, or just like you're chatting with a friend, this model's got you covered. ๐Ÿ’ฏ

๐Ÿš€ Whatโ€™s HelpingAI-TTS-v1?

HelpingAI-TTS-v1 is a beast when it comes to generating high-quality, customizable speech. It doesnโ€™t just spit out generic text; it feels what you're saying and brings it to life with style. Add a description to your speech, like how fast or slow it should be, if itโ€™s cheerful or serious, and BOOM โ€” you got yourself the perfect audio output. ๐ŸŽง

๐Ÿ› ๏ธ How It Works: A Quick Rundown ๐Ÿ”ฅ

  1. Transcript: The text you want to speak. Keep it casual, formal, or whatever suits your vibe.
  2. Caption: Describes how you want the speech to sound. Want a fast-paced, hype vibe or a calm, soothing tone? Just say it. ๐Ÿ”ฅ

๐Ÿ’ก Features Youโ€™ll Love:

  • Expressive Speech: This isnโ€™t just any TTS. You can describe the tone, speed, and vibe you want. Whether it's a peppy "Hey!" or a chill "What's up?", this modelโ€™s got your back.
  • Top-Notch Quality: Super clean audio. No static. Just pure, high-quality sound that makes your words pop.
  • Customizable Like Never Before: Play with emotions, tone, and even accents. Itโ€™s all about making it personal. ๐ŸŒ

๐Ÿ”ง Get Started: Installation ๐Ÿ”ฅ

Ready to vibe? Hereโ€™s how you set up HelpingAI-TTS-v1 in seconds:

pip install git+https://github.com/huggingface/parler-tts.git

๐Ÿ–ฅ๏ธ Usage: Let's Make Some Magic ๐ŸŽค

Hereโ€™s the code that gets the job done. Super simple to use, just plug in your text and describe how you want it to sound. Itโ€™s like setting the mood for a movie.

import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf

# Choose your device (GPU or CPU)
device = "cuda:0" if torch.cuda.is_available() else "cpu"

# Load the model and tokenizers
model = ParlerTTSForConditionalGeneration.from_pretrained("HelpingAI/HelpingAI-TTS-v1").to(device)
tokenizer = AutoTokenizer.from_pretrained("HelpingAI/HelpingAI-TTS-v1")
description_tokenizer = AutoTokenizer.from_pretrained(model.config.text_encoder._name_or_path)

# Customize your inputs: text + description
prompt = "Hey, what's up? Howโ€™s it going?"
description = "A friendly, upbeat, and casual tone with a moderate speed. Speaker sounds confident and relaxed."

# Tokenize the inputs
input_ids = description_tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

# Generate the audio
generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()

# Save the audio to a file
sf.write("output.wav", audio_arr, model.config.sampling_rate)

This will create a super clean .wav file with the speech you asked for. ๐Ÿ”ฅ

๐ŸŒ Language Support: Speak Your Language

No matter where you're from, HelpingAI-TTS-v1 has you covered. Officially supporting 20+ languages and unofficial support for a few more. Thatโ€™s global vibes right there. ๐ŸŒ

  • Assamese
  • Bengali
  • Bodo
  • Dogri
  • Kannada
  • Malayalam
  • Marathi
  • Sanskrit
  • Nepali
  • English
  • Telugu
  • Hindi
  • Gujarati
  • Konkani
  • Maithili
  • Manipuri
  • Odia
  • Santali
  • Sindhi
  • Tamil
  • Urdu
  • Chhattisgarhi
  • Kashmiri
  • Punjabi

Powered by HelpingAI, where we blend emotional intelligence with tech. ๐ŸŒŸ