File size: 3,662 Bytes

6bf0fa9
019e2a3
 
 
8c00071
019e2a3
 
 
 
 
 
8331c06
4345759
5067878
 
6bf0fa9
5d9a91a
 
2a2d5c1
faf666f
2a2d5c1
7fa53df
 
8a23304
5d9a91a
3f40cd9
5d9a91a
1fc3525
5d9a91a
23acb9e
 
5067878
 
 
 
c8dd13e
8a23304
7fa53df
 
 
 
8a23304
7fa53df
5067878
c8dd13e
 
7abc8f8
c8dd13e
794578f
 
5067878
 
794578f
c8dd13e
794578f
c8dd13e
 
 
22a403e
972caea
7fa53df
7184f5f
972caea
 
 
 
 
7184f5f
c8dd13e
7184f5f
c8dd13e
 
 
 
8331c06
c8dd13e
da43e6e
 
 
2502403
c8dd13e
 
 
 
 
c4effd2
4b59bb9
2a2d5c1
4b59bb9
 
 
 
 
 
 
 
 
23acb9e
 
 
 
2a2d5c1
 
23acb9e
 
7184f5f
 
4377106
 
2639eaf
794578f
4377106
 
6ab4672
 
4377106
 
dd13de0
 
 
2a2d5c1
081324f
4377106
 
 
c4effd2
 
 
 
 
 
 
 
 
4377106
c4effd2
 
 
5067878
 
 
 
 
 
c4effd2

---
license: mit
language:
- en
pipeline_tag: text-to-audio
tags:
- audiocraft
- audiogen
- styletts2
- shift
- audeering
- sound
- audio-generation
- text-to-speech
- mimic3
---


# Affective TTS / SoundScapes

  - [SHIFT TTS tool](https://github.com/audeering/shift) 
  - Analysis of TTS emotionality [#1](https://huggingface.co/dkounadis/artificial-styletts2/discussions/2)
  - Soundscapes `trees, water, ..` via [AudioGen](https://huggingface.co/dkounadis/artificial-styletts2/discussions/3)
  - `landscape2soundscape.py` generates soundscape / overlays TTS / creates video from image.

## Available Voices

<a href="https://audeering.github.io/shift/">Listen to available voices!</a>

## Flask API

<details>
<summary>
Create virtualenv
</summary>

Clone

```
git clone https://huggingface.co/dkounadis/artificial-styletts2
```
Install

```
virtualenv --python=python3 ~/.envs/.my_env
source ~/.envs/.my_env/bin/activate
cd artificial-styletts2/
pip install -r requirements.txt
```

</details>

Start Flask

```
CUDA_DEVICE_ORDER=PCI_BUS_ID HF_HOME=./hf_home CUDA_VISIBLE_DEVICES=2 python api.py
```

## Landscape 2 Soundscape

The following needs `api.py` to be already running on a tmux session. 

```python
# TTS & soundscape - overlay to .mp4
python landscape2soundscape.py
```

# Videos / Examples

Video where Native voice is replaced with English TTS voice


[![Same video w. Native voice replaced with English TTS](assets/tts_video_thumb.png)](https://www.youtube.com/watch?v=geI1Vqn4QpY)

## Joint Application of D3.1 & D3.2

<a href="https://youtu.be/wWC8DpOKVvQ" rel="Subtitles to Video">![Foo4](assets/caption_to_video_thumb.png)</a>


From an image and text create a video:

```python

python tts.py --text sample.txt --image assets/image_from_T31.jpg
```

## Landscape 2 Soundscape





```python
# Loads image & text & sound-scene text and creates .mp4
python landscape2soundscape.py
```

For SHIFT demo / Collaboration with [SMB](https://www.smb.museum/home/)
  - YouTube Videos


[![01](uc_spk_Landscape2Soundscape_Masterpieces_pics/thumb____01_Schick_AII840_001.jpg)](https://youtu.be/SSi3gUO4GtY)

[![02](uc_spk_Landscape2Soundscape_Masterpieces_pics/thumb____02_Constable_AI555_001.jpg)](https://youtu.be/2YjxAPkdXIc)

[![03](uc_spk_Landscape2Soundscape_Masterpieces_pics/thumb____03_Schinkel_WS200-002.jpg)](https://youtu.be/BhMh02knkco)



[![05](uc_spk_Landscape2Soundscape_Masterpieces_pics/thumb____05_Blechen_FV40_001.jpg)](https://youtu.be/a3qk9S87v60)

[![06](uc_spk_Landscape2Soundscape_Masterpieces_pics/thumb____06_Menzel_AI900_001.jpg)](https://youtu.be/3M0y9OYzDfU)

[![07](uc_spk_Landscape2Soundscape_Masterpieces_pics/thumb____07_Courbet_AI967_001.jpg)](https://youtu.be/OBY666_By1k)

[![08](uc_spk_Landscape2Soundscape_Masterpieces_pics/thumb____08_Monet_AI1013_001.jpg)](https://youtu.be/gnGCYLcdLsA)

[![10](uc_spk_Landscape2Soundscape_Masterpieces_pics/thumb____10_Boecklin_967648_NG2-80_001_rsz.jpg)](https://www.youtube.com/watch?v=Y8QyYUgLaCg)

[![11](uc_spk_Landscape2Soundscape_Masterpieces_pics/thumb____11_Liebermann_NG4-94_001.jpg)](https://youtu.be/XDDzxDSrhb0)

[![12](uc_spk_Landscape2Soundscape_Masterpieces_pics/thumb____12_Slevogt_AII1022_001.jpg)](https://youtu.be/I3YYKiUzHpA)




# Live Demo - Paplay

Flask

```python
CUDA_DEVICE_ORDER=PCI_BUS_ID HF_HOME=/data/dkounadis/.hf7/ CUDA_VISIBLE_DEVICES=4 python live_api.py
```

Client - Describe any sound with words and it will be played back to you.

```python
python live_demo.py  # will ask text input & play soundscape
```

# Simple Demo

```python
CUDA_DEVICE_ORDER=PCI_BUS_ID HF_HOME=/data/dkounadis/.hf7/ CUDA_VISIBLE_DEVICES=4 python demo.py
```