Babyloncoder's picture
Update README.md
60721d2 verified

A newer version of the Gradio SDK is available: 5.23.3

Upgrade
metadata
title: Text To Speech With Pitch Controls
emoji: πŸ†
colorFrom: blue
colorTo: gray
sdk: gradio
sdk_version: 4.22.0
app_file: app.py
pinned: false
license: mit
  1. Libraries and Tools Used:

    • Transformers: Provides the VitsModel and AutoTokenizer, with the use of facebook/mms-tts-eng model, a sophisticated text-to-speech model designed by Facebook.
    • Torch: A companion library for Transformers, essential for processing the data through the speech model.
    • Librosa: A library for audio processing, especially used here for pitch adjustment of the speech.
    • Soundfile: Utilized to save the speech output as an audio file.
    • Tempfile: Creates temporary files for intermediate storage during processing.
    • Gradio: Facilitates the creation of a user-friendly web interface for the text-to-speech application.
  2. Pipeline for Text-to-Speech Conversion:

    • Text Input: You begin by typing in the text you want to be converted into speech.
    • Tokenization: AutoTokenizer processes this text, preparing it for the speech model.
    • Speech Synthesis: The facebook/mms-tts-eng model within the VitsModel takes this processed text and generates the spoken words.
    • Pitch Adjustment: 0 Pitch Value: Represents the normal, unaltered pitch of the speech. This is the default state where the voice sounds as it naturally would, without any modifications. Negative Pitch Values: When you set the pitch to a negative value, it makes the voice sound higher. This is similar to moving up the notes on a piano, resulting in a higher, perhaps more youthful or feminine tone. Positive Pitch Values: Conversely, positive pitch values make the voice sound lower. This is akin to moving down the notes on a piano. A positive pitch shift results in a deeper, more resonant tone, often associated with a more masculine or mature voice.
    • Saving Audio: The speech with the adjusted pitch is saved as an audio file using Soundfile and Tempfile.
    • Interactive Web Interface: Gradio provides an interface where you input text, adjust the pitch using a slider, and listen to the speech output.