GitMylo
/

bark-voice-cloning

Feature Extraction

Model card Files Files and versions Community

GitMylo commited on May 23, 2023

Commit

28dc103

·

1 Parent(s): dbd5e29

Add squeeze info

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -27,7 +27,7 @@ Voice cloning is creating a new voice for text-to-speech.
 Process:
 1. Load your wav audio file into your pytorch application
-2. For the fine prompt extract [discrete representations](https://github.com/facebookresearch/encodec#extracting-discrete-representations). (These are used by bark to know about the voice)
 3. For the coarse prompt do `fine_prompt[:2, :]`, to get the coarse prompt from a fine prompt.
 4. For the semantics, load a HuBERT model without Kmeans (I personally use the [audiolm-pytorch](https://github.com/lucidrains/audiolm-pytorch) implementation's hubertwithkmeans, but i edited it to skip kmeans.)
 5. Next, to get the actual semantic tokens, run the tokens through this model. Your output will be compatible with bark.

 Process:
 1. Load your wav audio file into your pytorch application
+2. For the fine prompt extract [discrete representations](https://github.com/facebookresearch/encodec#extracting-discrete-representations). (These are used by bark to know about the voice), **make sure to `.squeeze()` the resulting codes.**
 3. For the coarse prompt do `fine_prompt[:2, :]`, to get the coarse prompt from a fine prompt.
 4. For the semantics, load a HuBERT model without Kmeans (I personally use the [audiolm-pytorch](https://github.com/lucidrains/audiolm-pytorch) implementation's hubertwithkmeans, but i edited it to skip kmeans.)
 5. Next, to get the actual semantic tokens, run the tokens through this model. Your output will be compatible with bark.