File size: 2,734 Bytes
6bf0fa9
019e2a3
 
 
 
 
 
 
 
 
 
 
 
 
8331c06
 
 
6bf0fa9
5d9a91a
 
dfeb7ed
5d9a91a
8331c06
 
 
 
5d9a91a
c8dd13e
5d9a91a
c8dd13e
5d9a91a
c8dd13e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5d9a91a
 
c8dd13e
 
2502403
c8dd13e
 
 
 
 
 
 
 
 
 
 
8331c06
 
 
dfeb7ed
8331c06
 
 
c8dd13e
 
 
 
 
 
 
 
 
 
5d9a91a
 
 
8331c06
c8dd13e
8331c06
c8dd13e
da43e6e
 
 
2502403
c8dd13e
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
license: mit
language:
- en
pipeline_tag: text-to-speech
tags:
- audiocraft
- audiogen
- styletts2
- audio
- synthesis
- shift
- audeering
- dkounadis
- sound
- scene
- acoustic-scene
---


# Affective TTS & Soundscape Synthesis

Affective TTS tool for [SHIFT Horizon](https://shift-europe.eu/).
  - Synthesizes affective speech with sound scape, trees, water, leaves, background from plain text or subtitles (.srt) & overlays it to videos.
  - `134` build-in affective voices available, tuned for [StyleTTS2](https://github.com/yl4579/StyleTTS2).
  - [GitHub](https://github.com/audeering/shift)

### Available Voices

<a href="https://audeering.github.io/shift/">Listen to available voices!</a>

## Flask API

Install

```
virtualenv --python=python3 ~/.envs/.my_env
source ~/.envs/.my_env/bin/activate
cd shift/
pip install -r requirements.txt
```

Start Flask

```
CUDA_DEVICE_ORDER=PCI_BUS_ID HF_HOME=./hf_home CUDA_VISIBLE_DEVICES=2 python api.py
```

## Inference

The following need `api.py` to be running, e.g. `.. on computeXX`. 

**Text 2 Speech**

```python
# Basic TTS - See Available Voices
python tts.py --text sample.txt --voice "en_US/m-ailabs_low#mary_ann" --affective

# voice cloning
python tts.py --text sample.txt --native assets/native_voice.wav
```

**Image 2 Video**

```python
# Make video narrating an image - All above TTS args apply also here!
python tts.py --text sample.txt --image assets/image_from_T31.jpg
```

**Video 2 Video**

```python
# Video Dubbing - from time-stamped subtitles (.srt)
python tts.py --text assets/head_of_fortuna_en.srt --video assets/head_of_fortuna.mp4

# Video narration - from text description (.txt)
python tts.py --text assets/head_of_fortuna_GPT.txt --video assets/head_of_fortuna.mp4
```

## Examples

Substitute Native voice via TTS

[![Native voice ANBPR video](assets/native_video_thumb.png)](https://www.youtube.com/watch?v=tmo2UbKYAqc)

##

Same video where Native voice is replaced with English TTS voice with similar emotion


[![Same video w. Native voice replaced with English TTS](assets/tts_video_thumb.png)](https://www.youtube.com/watch?v=geI1Vqn4QpY)


<details>
<summary>

Video dubbing from subtitles `.srt`

</summary>

## Video Dubbing

[![Review demo SHIFT](assets/review_demo_thumb.png)](https://www.youtube.com/watch?v=bpt7rOBENcQ)

Generate dubbed video:


```python
python tts.py --text assets/head_of_fortuna_en.srt --video assets/head_of_fortuna.mp4

```


</details>

## Joint Application of D3.1 & D3.2

<a href="https://youtu.be/wWC8DpOKVvQ" rel="Subtitles to Video">![Foo4](assets/caption_to_video_thumb.png)</a>


From an image and text create a video:

```python

python tts.py --text sample.txt --image assets/image_from_T31.jpg
```