README.md
Browse filesInitial readme
README.md
CHANGED
@@ -1,3 +1,47 @@
|
|
1 |
---
|
2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
pipeline_tag: feature-extraction
|
4 |
+
tags:
|
5 |
+
- bark
|
6 |
+
- tts
|
7 |
+
- hubert
|
8 |
+
- text-to-speech
|
9 |
---
|
10 |
+
# Bark-voice-cloning
|
11 |
+
Bark-voice-cloning is a model which processes the outputs from a HuBERT model, and turns them into semantic tokens compatible with bark text to speech.
|
12 |
+
|
13 |
+
This can be used for many things, including speech transfer and voice cloning.
|
14 |
+
|
15 |
+
# Voice cloning
|
16 |
+
Voice cloning is creating a new voice for text-to-speech.
|
17 |
+
|
18 |
+
Process:
|
19 |
+
1. Load your wav audio file into your pytorch application
|
20 |
+
2. For the fine prompt extract [discrete representations](https://github.com/facebookresearch/encodec#extracting-discrete-representations). (These are used by bark to know about the voice)
|
21 |
+
3. For the coarse prompt do `fine_prompt[:2, :]`, to get the coarse prompt from a fine prompt.
|
22 |
+
4. For the semantics, load a HuBERT model without Kmeans (I personally use the [audiolm-pytorch](https://github.com/lucidrains/audiolm-pytorch) implementation's hubertwithkmeans, but i edited it to skip kmeans.)
|
23 |
+
5. Next, to get the actual semantic tokens, run the tokens through this model. Your output will be compatible with bark.
|
24 |
+
6. Save these files in an npz with `numpy.savez(semantic_prompt=semantics, fine_prompt=fine, coarse_prompt=coarse)`. This is your speaker file containing your cloned voice.
|
25 |
+
|
26 |
+
# Voice masking
|
27 |
+
Voice masking is replacing a voice in an audio clip for speech-to-speech.
|
28 |
+
|
29 |
+
## Random
|
30 |
+
Replacing a voice in an audio clip with a voice generated by bark.
|
31 |
+
|
32 |
+
process:
|
33 |
+
1. Extract semantics from the audio clip using HuBERT and this model
|
34 |
+
2. Run `semantic_to_waveform` from `bark.api` with the extracted semantics
|
35 |
+
3. The previous step returns the generated audio.
|
36 |
+
|
37 |
+
## Transfer
|
38 |
+
Replacing a voice with a voice from another audio clip.
|
39 |
+
|
40 |
+
process:
|
41 |
+
1. Create a speaker file using the steps under the voice cloning section
|
42 |
+
2. Extract the semantics from the clip with the text you want spoken
|
43 |
+
3. Run `semantics_to_waveform` from `bark.api` with the extracted semantics, and the speaker prompt that you created on step 1.
|
44 |
+
4. The previous step returns the generated audio.
|
45 |
+
|
46 |
+
# Disclaimer
|
47 |
+
I am not responsible for any misuse of this model. I do not agree with cloning people's voices without permission. Please make sure it is appropriate to clone someone's voice before doing so.
|