File size: 2,734 Bytes
6bf0fa9 019e2a3 8331c06 6bf0fa9 5d9a91a dfeb7ed 5d9a91a 8331c06 5d9a91a c8dd13e 5d9a91a c8dd13e 5d9a91a c8dd13e 5d9a91a c8dd13e 2502403 c8dd13e 8331c06 dfeb7ed 8331c06 c8dd13e 5d9a91a 8331c06 c8dd13e 8331c06 c8dd13e da43e6e 2502403 c8dd13e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
---
license: mit
language:
- en
pipeline_tag: text-to-speech
tags:
- audiocraft
- audiogen
- styletts2
- audio
- synthesis
- shift
- audeering
- dkounadis
- sound
- scene
- acoustic-scene
---
# Affective TTS & Soundscape Synthesis
Affective TTS tool for [SHIFT Horizon](https://shift-europe.eu/).
- Synthesizes affective speech with sound scape, trees, water, leaves, background from plain text or subtitles (.srt) & overlays it to videos.
- `134` build-in affective voices available, tuned for [StyleTTS2](https://github.com/yl4579/StyleTTS2).
- [GitHub](https://github.com/audeering/shift)
### Available Voices
<a href="https://audeering.github.io/shift/">Listen to available voices!</a>
## Flask API
Install
```
virtualenv --python=python3 ~/.envs/.my_env
source ~/.envs/.my_env/bin/activate
cd shift/
pip install -r requirements.txt
```
Start Flask
```
CUDA_DEVICE_ORDER=PCI_BUS_ID HF_HOME=./hf_home CUDA_VISIBLE_DEVICES=2 python api.py
```
## Inference
The following need `api.py` to be running, e.g. `.. on computeXX`.
**Text 2 Speech**
```python
# Basic TTS - See Available Voices
python tts.py --text sample.txt --voice "en_US/m-ailabs_low#mary_ann" --affective
# voice cloning
python tts.py --text sample.txt --native assets/native_voice.wav
```
**Image 2 Video**
```python
# Make video narrating an image - All above TTS args apply also here!
python tts.py --text sample.txt --image assets/image_from_T31.jpg
```
**Video 2 Video**
```python
# Video Dubbing - from time-stamped subtitles (.srt)
python tts.py --text assets/head_of_fortuna_en.srt --video assets/head_of_fortuna.mp4
# Video narration - from text description (.txt)
python tts.py --text assets/head_of_fortuna_GPT.txt --video assets/head_of_fortuna.mp4
```
## Examples
Substitute Native voice via TTS
[![Native voice ANBPR video](assets/native_video_thumb.png)](https://www.youtube.com/watch?v=tmo2UbKYAqc)
##
Same video where Native voice is replaced with English TTS voice with similar emotion
[![Same video w. Native voice replaced with English TTS](assets/tts_video_thumb.png)](https://www.youtube.com/watch?v=geI1Vqn4QpY)
<details>
<summary>
Video dubbing from subtitles `.srt`
</summary>
## Video Dubbing
[![Review demo SHIFT](assets/review_demo_thumb.png)](https://www.youtube.com/watch?v=bpt7rOBENcQ)
Generate dubbed video:
```python
python tts.py --text assets/head_of_fortuna_en.srt --video assets/head_of_fortuna.mp4
```
</details>
## Joint Application of D3.1 & D3.2
<a href="https://youtu.be/wWC8DpOKVvQ" rel="Subtitles to Video">![Foo4](assets/caption_to_video_thumb.png)</a>
From an image and text create a video:
```python
python tts.py --text sample.txt --image assets/image_from_T31.jpg
```
|