GitMylo commited on
Commit
537bec2
1 Parent(s): d8e2d6d

Initial readme

Files changed (1) hide show
  1. README.md +44 -0
README.md CHANGED
@@ -1,3 +1,47 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ pipeline_tag: feature-extraction
4
+ tags:
5
+ - bark
6
+ - tts
7
+ - hubert
8
+ - text-to-speech
9
  ---
10
+ # Bark-voice-cloning
11
+ Bark-voice-cloning is a model which processes the outputs from a HuBERT model, and turns them into semantic tokens compatible with bark text to speech.
12
+
13
+ This can be used for many things, including speech transfer and voice cloning.
14
+
15
+ # Voice cloning
16
+ Voice cloning is creating a new voice for text-to-speech.
17
+
18
+ Process:
19
+ 1. Load your wav audio file into your pytorch application
20
+ 2. For the fine prompt extract [discrete representations](https://github.com/facebookresearch/encodec#extracting-discrete-representations). (These are used by bark to know about the voice)
21
+ 3. For the coarse prompt do `fine_prompt[:2, :]`, to get the coarse prompt from a fine prompt.
22
+ 4. For the semantics, load a HuBERT model without Kmeans (I personally use the [audiolm-pytorch](https://github.com/lucidrains/audiolm-pytorch) implementation's hubertwithkmeans, but i edited it to skip kmeans.)
23
+ 5. Next, to get the actual semantic tokens, run the tokens through this model. Your output will be compatible with bark.
24
+ 6. Save these files in an npz with `numpy.savez(semantic_prompt=semantics, fine_prompt=fine, coarse_prompt=coarse)`. This is your speaker file containing your cloned voice.
25
+
26
+ # Voice masking
27
+ Voice masking is replacing a voice in an audio clip for speech-to-speech.
28
+
29
+ ## Random
30
+ Replacing a voice in an audio clip with a voice generated by bark.
31
+
32
+ process:
33
+ 1. Extract semantics from the audio clip using HuBERT and this model
34
+ 2. Run `semantic_to_waveform` from `bark.api` with the extracted semantics
35
+ 3. The previous step returns the generated audio.
36
+
37
+ ## Transfer
38
+ Replacing a voice with a voice from another audio clip.
39
+
40
+ process:
41
+ 1. Create a speaker file using the steps under the voice cloning section
42
+ 2. Extract the semantics from the clip with the text you want spoken
43
+ 3. Run `semantics_to_waveform` from `bark.api` with the extracted semantics, and the speaker prompt that you created on step 1.
44
+ 4. The previous step returns the generated audio.
45
+
46
+ # Disclaimer
47
+ I am not responsible for any misuse of this model. I do not agree with cloning people's voices without permission. Please make sure it is appropriate to clone someone's voice before doing so.