|
# ๐ถ Bark |
|
|
|
Bark is a multi-lingual TTS model created by [Suno-AI](https://www.suno.ai/). It can generate conversational speech as well as music and sound effects. |
|
It is architecturally very similar to Google's [AudioLM](https://arxiv.org/abs/2209.03143). For more information, please refer to the [Suno-AI's repo](https://github.com/suno-ai/bark). |
|
|
|
|
|
## Acknowledgements |
|
- ๐[Suno-AI](https://www.suno.ai/) for training and open-sourcing this model. |
|
- ๐[gitmylo](https://github.com/gitmylo) for finding [the solution](https://github.com/gitmylo/bark-voice-cloning-HuBERT-quantizer/) to the semantic token generation for voice clones and finetunes. |
|
- ๐[serp-ai](https://github.com/serp-ai/bark-with-voice-clone) for controlled voice cloning. |
|
|
|
|
|
## Example Use |
|
|
|
```python |
|
text = "Hello, my name is Manmay , how are you?" |
|
|
|
from TTS.tts.configs.bark_config import BarkConfig |
|
from TTS.tts.models.bark import Bark |
|
|
|
config = BarkConfig() |
|
model = Bark.init_from_config(config) |
|
model.load_checkpoint(config, checkpoint_dir="path/to/model/dir/", eval=True) |
|
|
|
# with random speaker |
|
output_dict = model.synthesize(text, config, speaker_id="random", voice_dirs=None) |
|
|
|
# cloning a speaker. |
|
# It assumes that you have a speaker file in `bark_voices/speaker_n/speaker.wav` or `bark_voices/speaker_n/speaker.npz` |
|
output_dict = model.synthesize(text, config, speaker_id="ljspeech", voice_dirs="bark_voices/") |
|
``` |
|
|
|
Using ๐ธTTS API: |
|
|
|
```python |
|
from TTS.api import TTS |
|
|
|
# Load the model to GPU |
|
# Bark is really slow on CPU, so we recommend using GPU. |
|
tts = TTS("tts_models/multilingual/multi-dataset/bark", gpu=True) |
|
|
|
|
|
# Cloning a new speaker |
|
# This expects to find a mp3 or wav file like `bark_voices/new_speaker/speaker.wav` |
|
# It computes the cloning values and stores in `bark_voices/new_speaker/speaker.npz` |
|
tts.tts_to_file(text="Hello, my name is Manmay , how are you?", |
|
file_path="output.wav", |
|
voice_dir="bark_voices/", |
|
speaker="ljspeech") |
|
|
|
|
|
# When you run it again it uses the stored values to generate the voice. |
|
tts.tts_to_file(text="Hello, my name is Manmay , how are you?", |
|
file_path="output.wav", |
|
voice_dir="bark_voices/", |
|
speaker="ljspeech") |
|
|
|
|
|
# random speaker |
|
tts = TTS("tts_models/multilingual/multi-dataset/bark", gpu=True) |
|
tts.tts_to_file("hello world", file_path="out.wav") |
|
``` |
|
|
|
Using ๐ธTTS Command line: |
|
|
|
```console |
|
# cloning the `ljspeech` voice |
|
tts --model_name tts_models/multilingual/multi-dataset/bark \ |
|
--text "This is an example." \ |
|
--out_path "output.wav" \ |
|
--voice_dir bark_voices/ \ |
|
--speaker_idx "ljspeech" \ |
|
--progress_bar True |
|
|
|
# Random voice generation |
|
tts --model_name tts_models/multilingual/multi-dataset/bark \ |
|
--text "This is an example." \ |
|
--out_path "output.wav" \ |
|
--progress_bar True |
|
``` |
|
|
|
|
|
## Important resources & papers |
|
- Original Repo: https://github.com/suno-ai/bark |
|
- Cloning implementation: https://github.com/serp-ai/bark-with-voice-clone |
|
- AudioLM: https://arxiv.org/abs/2209.03143 |
|
|
|
## BarkConfig |
|
```{eval-rst} |
|
.. autoclass:: TTS.tts.configs.bark_config.BarkConfig |
|
:members: |
|
``` |
|
|
|
## Bark Model |
|
```{eval-rst} |
|
.. autoclass:: TTS.tts.models.bark.Bark |
|
:members: |
|
``` |
|
|