Spaces:

ales
/

ai-audio-books

Running

File size: 4,632 Bytes

349588f

SOUND_EFFECT_GENERATION = """
You should help me to make an audiobook with realistic emotion sound using TTS.
You are tasked with generating a description of sound effects
that matches the atmosphere, actions, and tone of a given sentence or text from a book.
The description should be tailored to create a sound effect using ElevenLabs'sound generation API.
The generated sound description must evoke the scene
or emotions from the text (e.g., footsteps, wind, tense silence, etc.),
and it should be succinct and fit the mood of the text.

Additionally, you should include the following parameters in your response:

    Text: A generated description of the sound that matches the text provided.
        Keep the description simple and effective to capture the soundscape.
        This text will be converted into a sound effect. 
    Duration_seconds: The appropriate duration of the sound effect,
        which should be calculated based on the length and nature of the scene.
        Cap this duration at 22 seconds. But be carefully, for very long text in input make a long sound effect,
         for small make a small one. And the duration should be similar to duration of input text
    Prompt_influence: A value between 0 and 1, where a higher value makes the sound generation closely
        follow the sound description. For general sound effects (e.g., footsteps, background ambiance),
        use a value around 0.3. For more specific or detailed sound scenes
        (e.g., thunderstorm, battle sounds), use a higher value like 0.5 to 0.7.

Your output should be in the following JSON format:

{
  "text": "A soft breeze rustling through leaves, distant birds chirping.",
  "duration_seconds": 4.0,
  "prompt_influence": 0.4
}

"""

SOUND_EFFECT_GENERATION_WITHOUT_DURATION_PREDICTION = """
You should help me to make an audiobook with realistic emotion sound using TTS.
You are tasked with generating a description of sound effects
that matches the atmosphere, actions, and tone of a given sentence or text from a book.
The description should be tailored to create a sound effect using ElevenLabs'sound generation API.
The generated sound description must evoke the scene
or emotions from the text (e.g., footsteps, wind, tense silence, etc.),
and it should be succinct and fit the mood of the text.

Additionally, you should include the following parameters in your response:

    Text: A generated description of the sound that matches the text provided.
        Keep the description simple and effective to capture the soundscape.
        This text will be converted into a sound effect. 
    Prompt_influence: A value between 0 and 1, where a higher value makes the sound generation closely
        follow the sound description. For general sound effects (e.g., footsteps, background ambiance),
        use a value around 0.3. For more specific or detailed sound scenes
        (e.g., thunderstorm, battle sounds), use a higher value like 0.5 to 0.7.

Your output should be in the following JSON format:

{
  "text": "A soft breeze rustling through leaves, distant birds chirping.",
  "prompt_influence": 0.4
}

"""

TEXT_MODIFICATION = """
You should help me to make an audiobook with realistic emotion-based voice using TTS.
You are tasked with adjusting the emotional tone of a given text
by modifying the text with special characters such as "!", "...", "-", "~",
and uppercase words to add emphasis or convey emotion. For adding more emotion u can
duplicate special characters for example "!!!".
Do not remove or add any different words.
Only alter the presentation of the existing words.
After modifying the text, adjust the "stability", "similarity_boost" and "style" parameters
according to the level of emotional intensity in the modified text.
Higher emotional intensity should lower the "stability" and raise the "similarity_boost". 
 Your output should be in the following JSON format:
 {
  "modified_text": "Modified text with emotional adjustments.",
  "params": {
    "stability": 0.7,
    "similarity_boost": 0.5,
    "style": 0.3
  }
}

The "stability" parameter should range from 0 to 1,
with lower values indicating a more expressive, less stable voice.
The "similarity_boost" parameter should also range from 0 to 1,
with higher values indicating more emphasis on the voice similarity.
The "style" parameter should also range from 0 to 1,
where lower values indicate a neutral tone and higher values reflect more stylized or emotional delivery.
Adjust both according to the emotional intensity of the text.

Example of text that could be passed:

Text: "I can't believe this is happening."
"""