File size: 2,734 Bytes

6bf0fa9
019e2a3
 
 
 
 
 
 
 
 
 
 
 
 
8331c06
 
 
6bf0fa9
5d9a91a
 
dfeb7ed
5d9a91a
8331c06
 
 
 
5d9a91a
c8dd13e
5d9a91a
c8dd13e
5d9a91a
c8dd13e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5d9a91a
 
c8dd13e
 
2502403
c8dd13e
 
 
 
 
 
 
 
 
 
 
8331c06
 
 
dfeb7ed
8331c06
 
 
c8dd13e
 
 
 
 
 
 
 
 
 
5d9a91a
 
 
8331c06
c8dd13e
8331c06
c8dd13e
da43e6e
 
 
2502403
c8dd13e

---
license: mit
language:
- en
pipeline_tag: text-to-speech
tags:
- audiocraft
- audiogen
- styletts2
- audio
- synthesis
- shift
- audeering
- dkounadis
- sound
- scene
- acoustic-scene
---


# Affective TTS & Soundscape Synthesis

Affective TTS tool for [SHIFT Horizon](https://shift-europe.eu/).
  - Synthesizes affective speech with sound scape, trees, water, leaves, background from plain text or subtitles (.srt) & overlays it to videos.
  - `134` build-in affective voices available, tuned for [StyleTTS2](https://github.com/yl4579/StyleTTS2).
  - [GitHub](https://github.com/audeering/shift)

### Available Voices

<a href="https://audeering.github.io/shift/">Listen to available voices!</a>

## Flask API

Install

```
virtualenv --python=python3 ~/.envs/.my_env
source ~/.envs/.my_env/bin/activate
cd shift/
pip install -r requirements.txt
```

Start Flask

```
CUDA_DEVICE_ORDER=PCI_BUS_ID HF_HOME=./hf_home CUDA_VISIBLE_DEVICES=2 python api.py
```

## Inference

The following need `api.py` to be running, e.g. `.. on computeXX`. 

**Text 2 Speech**

```python
# Basic TTS - See Available Voices
python tts.py --text sample.txt --voice "en_US/m-ailabs_low#mary_ann" --affective

# voice cloning
python tts.py --text sample.txt --native assets/native_voice.wav
```

**Image 2 Video**

```python
# Make video narrating an image - All above TTS args apply also here!
python tts.py --text sample.txt --image assets/image_from_T31.jpg
```

**Video 2 Video**

```python
# Video Dubbing - from time-stamped subtitles (.srt)
python tts.py --text assets/head_of_fortuna_en.srt --video assets/head_of_fortuna.mp4

# Video narration - from text description (.txt)
python tts.py --text assets/head_of_fortuna_GPT.txt --video assets/head_of_fortuna.mp4
```

## Examples

Substitute Native voice via TTS

[![Native voice ANBPR video](assets/native_video_thumb.png)](https://www.youtube.com/watch?v=tmo2UbKYAqc)

##

Same video where Native voice is replaced with English TTS voice with similar emotion


[![Same video w. Native voice replaced with English TTS](assets/tts_video_thumb.png)](https://www.youtube.com/watch?v=geI1Vqn4QpY)


<details>
<summary>

Video dubbing from subtitles `.srt`

</summary>

## Video Dubbing

[![Review demo SHIFT](assets/review_demo_thumb.png)](https://www.youtube.com/watch?v=bpt7rOBENcQ)

Generate dubbed video:


```python
python tts.py --text assets/head_of_fortuna_en.srt --video assets/head_of_fortuna.mp4

```


</details>

## Joint Application of D3.1 & D3.2

<a href="https://youtu.be/wWC8DpOKVvQ" rel="Subtitles to Video">![Foo4](assets/caption_to_video_thumb.png)</a>


From an image and text create a video:

```python

python tts.py --text sample.txt --image assets/image_from_T31.jpg
```