Audiocraft

Audiocraft is a PyTorch library for deep learning research on audio generation. At the moment, it contains the code for MusicGen, a state-of-the-art controllable text-to-music model.

MusicGen

Audiocraft provides the code and models for MusicGen, a simple and controllable model for music generation. MusicGen is a single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike existing methods like MusicLM, MusicGen doesn't require a self-supervised semantic representation, and it generates all 4 codebooks in one pass. By introducing a small delay between the codebooks, we show we can predict them in parallel, thus having only 50 auto-regressive steps per second of audio. Check out our sample page or test the available demo!

Installation

Audiocraft requires Python 3.9, PyTorch 2.0.0, and a GPU with at least 16 GB of memory (for the medium-sized model). To install Audiocraft, you can run the following:

# Best to make sure you have torch installed first, in particular before installing xformers.
# Don't run this if you already have PyTorch installed.
pip install 'torch>=2.0'
# Then proceed to one of the following
pip install -U audiocraft  # stable release
pip install -U git+https://[email protected]/facebookresearch/audiocraft#egg=audiocraft  # bleeding edge
pip install -e .  # or if you cloned the repo locally

Usage

We offer a number of way to interact with MusicGen:

You can play with MusicGen by running the jupyter notebook at demo.ipynb locally, or use the provided colab notebook.
You can use the gradio demo locally by running python app.py.
Finally, a demo is also available on the facebook/MusicGen HuggingFace Space (huge thanks to all the HF team for their support).

API

We provide a simple API and 4 pre-trained models. The pre trained models are:

small: 300M model, text to music only - 🤗 Hub
medium: 1.5B model, text to music only - 🤗 Hub
melody: 1.5B model, text to music and text+melody to music - 🤗 Hub
large: 3.3B model, text to music only - 🤗 Hub

We observe the best trade-off between quality and compute with the medium or melody model. In order to use MusicGen locally you must have a GPU. We recommend 16GB of memory, but smaller GPUs will be able to generate short sequences, or longer sequences with the small model.

Note: Please make sure to have ffmpeg installed when using newer version of torchaudio. You can install it with:

apt-get install ffmpeg

See after a quick example for using the API.

import torchaudio
from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write

model = MusicGen.get_pretrained('melody')
model.set_generation_params(duration=8)  # generate 8 seconds.
wav = model.generate_unconditional(4)    # generates 4 unconditional audio samples
descriptions = ['happy rock', 'energetic EDM', 'sad jazz']
wav = model.generate(descriptions)  # generates 3 samples.

melody, sr = torchaudio.load('./assets/bach.mp3')
# generates using the melody from the given audio and the provided descriptions.
wav = model.generate_with_chroma(descriptions, melody[None].expand(3, -1, -1), sr)

for idx, one_wav in enumerate(wav):
    # Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
    audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness")

Model Card

See the model card page.

FAQ

Will the training code be released?

Yes. We will soon release the training code for MusicGen and EnCodec.

Citation

@article{copet2023simple,
      title={Simple and Controllable Music Generation},
      author={Jade Copet and Felix Kreuk and Itai Gat and Tal Remez and David Kant and Gabriel Synnaeve and Yossi Adi and Alexandre Défossez},
      year={2023},
      journal={arXiv preprint arXiv:2306.05284},
}

License

The code in this repository is released under the MIT license as found in the LICENSE file.
The weights in this repository are released under the CC-BY-NC 4.0 license as found in the LICENSE_weights file.