Higobeatz commited on
Commit
2315c29
·
1 Parent(s): 17420f4

openvoice plugin

Browse files
.ipynb_checkpoints/README-checkpoint.md DELETED
@@ -1,158 +0,0 @@
1
- ---
2
- language:
3
- - en
4
- tags:
5
- - myshell
6
- - speech-to-speech
7
- ---
8
- <!-- might put a [width=2000 * height=xxx] img here, this size best fits git page
9
- <img src="resources\cover.png"> -->
10
- <img src="resources/dreamvoice.png">
11
-
12
- # DreamVoice: Text-guided Voice Conversion
13
-
14
- --------------------
15
-
16
- ## Introduction
17
-
18
- DreamVoice is an innovative approach to voice conversion (VC) that leverages text-guided generation to create personalized and versatile voice experiences.
19
- Unlike traditional VC methods, which require a target recording during inference, DreamVoice introduces a more intuitive solution by allowing users to specify desired voice timbres through text prompts.
20
-
21
- For more details, please check our interspeech paper: [DreamVoice](https://arxiv.org/abs/2406.16314)
22
-
23
- To listen to demos and download dataset, please check dreamvoice's homepage: [Homepage](https://haidog-yaqub.github.io/dreamvoice_demo/)
24
-
25
-
26
- # How to Use
27
-
28
- To load the models, you need to install packages:
29
-
30
- ```
31
- pip install -r requirements.txt
32
- ```
33
-
34
- Then you can use the model with the following code:
35
-
36
- - NEW! DreamVoice Plugin for OpenVoice (DreamVG + [Opnevoice](https://github.com/myshell-ai/OpenVoice))
37
-
38
- ```python
39
- import torch
40
- from dreamvoice import DreamVoice_Plugin
41
- from dreamvoice.openvoice_utils import se_extractor
42
- from openvoice.api import ToneColorConverter
43
-
44
- # init dreamvoice
45
- dreamvoice = DreamVoice_Plugin(device='cuda')
46
-
47
- # init openvoice
48
- ckpt_converter = 'checkpoints_v2/converter'
49
- openvoice = ToneColorConverter(f'{ckpt_converter}/config.json', device='cuda')
50
- openvoice.load_ckpt(f'{ckpt_converter}/checkpoint.pth')
51
-
52
- # generate speaker
53
- prompt = 'young female voice, sounds young and cute'
54
- target_se = dreamvoice.gen_spk(prompt)
55
- target_se = target_se.unsqueeze(-1)
56
-
57
- # content source
58
- source_path = 'examples/test2.wav'
59
- source_se = se_extractor(source_path, openvoice).to(device)
60
-
61
- # voice conversion
62
- encode_message = "@MyShell"
63
- openvoice.convert(
64
- audio_src_path=source_path,
65
- src_se=source_se,
66
- tgt_se=target_se,
67
- output_path='output.wav',
68
- message=encode_message)
69
- ```
70
-
71
- - DreamVoice Plugin for DiffVC (Diffusion-based VC Model)
72
-
73
- ```python
74
- from dreamvoice import DreamVoice
75
-
76
- # Initialize DreamVoice in plugin mode with CUDA device
77
- dreamvoice = DreamVoice(mode='plugin', device='cuda')
78
- # Description of the target voice
79
- prompt = 'young female voice, sounds young and cute'
80
- # Provide the path to the content audio and generate the converted audio
81
- gen_audio, sr = dreamvoice.genvc('examples/test1.wav', prompt)
82
- # Save the converted audio
83
- dreamvoice.save_audio('gen1.wav', gen_audio, sr)
84
-
85
- # Save the speaker embedding if you like the generated voice
86
- dreamvoice.save_spk_embed('voice_stash1.pt')
87
- # Load the saved speaker embedding
88
- dreamvoice.load_spk_embed('voice_stash1.pt')
89
- # Use the saved speaker embedding for another audio sample
90
- gen_audio2, sr = dreamvoice.simplevc('examples/test2.wav', use_spk_cache=True)
91
- dreamvoice.save_audio('gen2.wav', gen_audio2, sr)
92
- ```
93
-
94
- # Training Guide
95
-
96
- 1. download VCTK and LibriTTS-R
97
- 2. download [DreamVoice DataSet](https://haidog-yaqub.github.io/dreamvoice_demo/)
98
- 3. extract speaker embeddings and cache in local path:
99
- ```
100
- python dreamvoice/train_utils/prepare/prepare_se.py
101
- ```
102
- 4. modify trainning config and train your dreamvoice plugin:
103
- ```
104
- cd dreamvoice/train_utils/src
105
- accelerate launch train.py
106
- ```
107
-
108
-
109
- # Extra Features
110
-
111
- - End-to-end DreamVoice VC Model
112
-
113
- ```python
114
- from dreamvoice import DreamVoice
115
-
116
- # Initialize DreamVoice in end-to-end mode with CUDA device
117
- dreamvoice = DreamVoice(mode='end2end', device='cuda')
118
- # Provide the path to the content audio and generate the converted audio
119
- gen_end2end, sr = dreamvoice.genvc('examples/test1.wav', prompt)
120
- # Save the converted audio
121
- dreamvoice.save_audio('gen_end2end.wav', gen_end2end, sr)
122
-
123
- # Note: End-to-end mode does not support saving speaker embeddings
124
- # To use a voice generated in end-to-end mode, switch back to plugin mode
125
- # and extract the speaker embedding from the generated audio
126
- # Switch back to plugin mode
127
- dreamvoice = DreamVoice(mode='plugin', device='cuda')
128
- # Load the speaker audio from the previously generated file
129
- gen_end2end2, sr = dreamvoice.simplevc('examples/test2.wav', speaker_audio='gen_end2end.wav')
130
- # Save the new converted audio
131
- dreamvoice.save_audio('gen_end2end2.wav', gen_end2end2, sr)
132
- ```
133
-
134
- - DiffVC (Diffusion-based VC Model)
135
-
136
- ```python
137
- from dreamvoice import DreamVoice
138
-
139
- # Plugin mode can be used for traditional one-shot voice conversion
140
- dreamvoice = DreamVoice(mode='plugin', device='cuda')
141
- # Generate audio using traditional one-shot voice conversion
142
- gen_tradition, sr = dreamvoice.simplevc('examples/test1.wav', speaker_audio='examples/speaker.wav')
143
- # Save the converted audio
144
- dreamvoice.save_audio('gen_tradition.wav', gen_tradition, sr)
145
- ```
146
-
147
- ## Reference
148
-
149
- If you find the code useful for your research, please consider citing:
150
-
151
- ```bibtex
152
- @article{hai2024dreamvoice,
153
- title={DreamVoice: Text-Guided Voice Conversion},
154
- author={Hai, Jiarui and Thakkar, Karan and Wang, Helin and Qin, Zengyi and Elhilali, Mounya},
155
- journal={arXiv preprint arXiv:2406.16314},
156
- year={2024}
157
- }
158
- ```