Add squeeze info
Browse files
README.md
CHANGED
@@ -27,7 +27,7 @@ Voice cloning is creating a new voice for text-to-speech.
|
|
27 |
|
28 |
Process:
|
29 |
1. Load your wav audio file into your pytorch application
|
30 |
-
2. For the fine prompt extract [discrete representations](https://github.com/facebookresearch/encodec#extracting-discrete-representations). (These are used by bark to know about the voice)
|
31 |
3. For the coarse prompt do `fine_prompt[:2, :]`, to get the coarse prompt from a fine prompt.
|
32 |
4. For the semantics, load a HuBERT model without Kmeans (I personally use the [audiolm-pytorch](https://github.com/lucidrains/audiolm-pytorch) implementation's hubertwithkmeans, but i edited it to skip kmeans.)
|
33 |
5. Next, to get the actual semantic tokens, run the tokens through this model. Your output will be compatible with bark.
|
|
|
27 |
|
28 |
Process:
|
29 |
1. Load your wav audio file into your pytorch application
|
30 |
+
2. For the fine prompt extract [discrete representations](https://github.com/facebookresearch/encodec#extracting-discrete-representations). (These are used by bark to know about the voice), **make sure to `.squeeze()` the resulting codes.**
|
31 |
3. For the coarse prompt do `fine_prompt[:2, :]`, to get the coarse prompt from a fine prompt.
|
32 |
4. For the semantics, load a HuBERT model without Kmeans (I personally use the [audiolm-pytorch](https://github.com/lucidrains/audiolm-pytorch) implementation's hubertwithkmeans, but i edited it to skip kmeans.)
|
33 |
5. Next, to get the actual semantic tokens, run the tokens through this model. Your output will be compatible with bark.
|