# MusicGen
Welcome to MusicGen's demo jupyter notebook. Here you will find a series of self-contained examples of how to use MusicGen in different settings.

First, we start by initializing MusicGen, you can choose a model from the following selection:
1. `small` - 300M transformer decoder.
2. `medium` - 1.5B transformer decoder.
3. `melody` - 1.5B transformer decoder also supporting melody conditioning.
4. `large` - 3.3B transformer decoder.

We will use the `small` variant for the purpose of this demonstration.

In [1]:
!pip install git+https://github.com/facebookresearch/audiocraft.git


Collecting git+https://github.com/facebookresearch/audiocraft.git
 Cloning https://github.com/facebookresearch/audiocraft.git to /tmp/pip-req-build-02ky6lic
 Running command git clone --filter=blob:none --quiet https://github.com/facebookresearch/audiocraft.git /tmp/pip-req-build-02ky6lic
 Resolved https://github.com/facebookresearch/audiocraft.git to commit 69fea8b290ad1b4b40d28f92d1dfc0ab01dbab85
 Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting av (from audiocraft==1.3.0a1)
 Downloading av-11.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (32.9 MB)
[2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m32.9/32.9 MB[0m [31m14.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting einops (from audiocraft==1.3.0a1)
 Downloading einops-0.7.0-py3-none-any.whl (44 kB)
[2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.6/44.6 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting flashy>=0.0.1 (from audiocraft==1.3.0a1)
 Downloading flashy-

In [2]:
%pip install audiocraft



In [3]:
#import audiocraft
from audiocraft.models import MusicGen

# Using small model, better results would be obtained with `medium` or `large`.
model = MusicGen.get_pretrained('small')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


state_dict.bin: 0%| | 0.00/841M [00:00<?, ?B/s]

spiece.model: 0%| | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0%| | 0.00/1.39M [00:00<?, ?B/s]

config.json: 0%| | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors: 0%| | 0.00/892M [00:00<?, ?B/s]

compression_state_dict.bin: 0%| | 0.00/236M [00:00<?, ?B/s]



Next, let us configure the generation parameters. Specifically, you can control the following:
* `use_sampling` (bool, optional): use sampling if True, else do argmax decoding. Defaults to True.
* `top_k` (int, optional): top_k used for sampling. Defaults to 250.
* `top_p` (float, optional): top_p used for sampling, when set to 0 top_k is used. Defaults to 0.0.
* `temperature` (float, optional): softmax temperature parameter. Defaults to 1.0.
* `duration` (float, optional): duration of the generated waveform. Defaults to 30.0.
* `cfg_coef` (float, optional): coefficient used for classifier free guidance. Defaults to 3.0.

When left unchanged, MusicGen will revert to its default parameters.

In [4]:
model.set_generation_params(
 use_sampling=True,
 top_k=250,
 duration=5
)

Next, we can go ahead and start generating music using one of the following modes:
* Unconditional samples using `model.generate_unconditional`
* Music continuation using `model.generate_continuation`
* Text-conditional samples using `model.generate`
* Melody-conditional samples using `model.generate_with_chroma`

### Unconditional Generation

In [5]:
from audiocraft.utils.notebook import display_audio

output = model.generate_unconditional(num_samples=2, progress=True)
display_audio(output, sample_rate=32000)



### Music Continuation

In [6]:
import math
import torchaudio
import torch
from audiocraft.utils.notebook import display_audio

def get_bip_bip(bip_duration=0.125, frequency=440,
 duration=0.5, sample_rate=32000, device="cuda"):
 """Generates a series of bip bip at the given frequency."""
 t = torch.arange(
 int(duration * sample_rate), device="cuda", dtype=torch.float) / sample_rate
 wav = torch.cos(2 * math.pi * 440 * t)[None]
 tp = (t % (2 * bip_duration)) / (2 * bip_duration)
 envelope = (tp >= 0.5).float()
 return wav * envelope


In [7]:
# Here we use a synthetic signal to prompt both the tonality and the BPM
# of the generated audio.
res = model.generate_continuation(
 get_bip_bip(0.125).expand(2, -1, -1),
 32000, ['Jazz jazz and only jazz',
 'Heartful EDM with beautiful synths and chords'],
 progress=True)
display_audio(res, 32000)



In [9]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [11]:
# You can also use any audio from a file. Make sure to trim the file if it is too long!
prompt_waveform, prompt_sr = torchaudio.load("/content/drive/MyDrive/Colab Notebooks/audio_output/dataset_example_electro_2.mp3")
prompt_duration = 2
prompt_waveform = prompt_waveform[..., :int(prompt_duration * prompt_sr)]
output = model.generate_continuation(prompt_waveform, prompt_sample_rate=prompt_sr, progress=True)
display_audio(output, sample_rate=32000)



### Text-conditional Generation

In [12]:
from audiocraft.utils.notebook import display_audio

output = model.generate(
 descriptions=[
 'a funky house with 80s hip hop vibes',
 '90s rock song with loud guitars and heavy drums',
 ],
 progress=True
)
display_audio(output, sample_rate=32000)



In [None]:
# !pip install audiocraft

### Melody-conditional Generation

In [13]:
import torchaudio
from audiocraft.utils.notebook import display_audio

model = MusicGen.get_pretrained('melody')
model.set_generation_params(duration=8)

melody_waveform, sr = torchaudio.load("/content/drive/MyDrive/Colab Notebooks/audio_output/dataset_example_electro_2.mp3")
melody_waveform = melody_waveform.unsqueeze(0).repeat(2, 1, 1)
output = model.generate_with_chroma(
 descriptions=[
 '80s pop track with bassy drums and synth',
 '90s rock song with loud guitars and heavy drums',
 ],
 melody_wavs=melody_waveform,
 melody_sample_rate=sr,
 progress=True
)
display_audio(output, sample_rate=32000)



state_dict.bin: 0%| | 0.00/2.77G [00:00<?, ?B/s]

Downloading: "https://dl.fbaipublicfiles.com/demucs/hybrid_transformer/955717e8-8726e21a.th" to /root/.cache/torch/hub/checkpoints/955717e8-8726e21a.th
100%|██████████| 80.2M/80.2M [00:01<00:00, 69.4MB/s]


compression_state_dict.bin: 0%| | 0.00/236M [00:00<?, ?B/s]





In [15]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

In [16]:
# Install PyTorch
!pip install 'torch==2.1.0'



In [17]:
!sudo apt-get install ffmpeg

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
ffmpeg is already the newest version (7:4.4.2-0ubuntu0.22.04.1).
0 upgraded, 0 newly installed, 0 to remove and 31 not upgraded.


In [18]:
! pip install streamlit -q

[2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.4/8.4 MB[0m [31m26.7 MB/s[0m eta [36m0:00:00[0m
[2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m196.4/196.4 kB[0m [31m25.6 MB/s[0m eta [36m0:00:00[0m
[2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.8/4.8 MB[0m [31m54.0 MB/s[0m eta [36m0:00:00[0m
[2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m82.1/82.1 kB[0m [31m11.6 MB/s[0m eta [36m0:00:00[0m
[2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.7/62.7 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[?25h

In [19]:
!wget -q -O - ipv4.icanhazip.com

34.16.174.223


In [52]:
%%writefile app.py
import streamlit as st
import torch
import torchaudio
from audiocraft.models import MusicGen
import os
import numpy as np
import base64

genres = ["Pop", "Rock", "Jazz", "Electronic", "Hip-Hop", "Classical", "Lofi", "Chillpop"]

@st.cache_resource()
def load_model():
 model = MusicGen.get_pretrained('facebook/musicgen-small')
 return model

def generate_music_tensors(descriptions, duration: int):
 model = load_model()

 model.set_generation_params(
 use_sampling=True,
 top_k=250,
 duration=duration
 )

 with st.spinner("Generating Music..."):
 output = model.generate(
 descriptions=descriptions,
 progress=True,
 return_tokens=True
 )

 st.success("Music Generation Complete!")
 return output


def save_audio(samples: torch.Tensor):
 sample_rate = 30000
 save_path = "/content/drive/MyDrive/Colab Notebooks/audio_output"
 assert samples.dim() == 2 or samples.dim() == 3

 samples = samples.detach().cpu()
 if samples.dim() == 2:
 samples = samples[None, ...]

 for idx, audio in enumerate(samples):
 audio_path = os.path.join(save_path, f"audio_{idx}.wav")
 torchaudio.save(audio_path, audio, sample_rate)

def get_binary_file_downloader_html(bin_file, file_label='File'):
 with open(bin_file, 'rb') as f:
 data = f.read()
 bin_str = base64.b64encode(data).decode()
 href = f'<a href="data:application/octet-stream;base64,{bin_str}" download="{os.path.basename(bin_file)}">Download {file_label}</a>'
 return href

st.set_page_config(
 page_icon= "musical_note",
 page_title= "Music Gen"
)

def main():
 with st.sidebar:
 st.header("""⚙️Generate Music ⚙️""",divider="rainbow")
 st.text("")
 st.subheader("1. Enter your music description.......")
 bpm = st.number_input("Enter Speed in BPM", min_value=60)

 text_area = st.text_area('Ex : 80s rock song with guitar and drums')
 st.text('')
 # Dropdown for genres
 selected_genre = st.selectbox("Select Genre", genres)

 st.subheader("2. Select time duration (In Seconds)")
 time_slider = st.slider("Select time duration (In Seconds)", 0, 60, 10)

 st.title("""🎵 Song Lab AI 🎵""")
 st.text('')
 left_co,right_co = st.columns(2)
 left_co.write("""Music Generation through a prompt""")
 left_co.write(("""PS : First generation may take some time ......."""))

 if st.sidebar.button('Generate !'):
 with left_co:
 st.text('')
 st.text('')
 st.text('')
 st.text('')
 st.text('')
 st.text('')
 st.subheader("Generated Music")

 # Generate audio
 descriptions = [f"{text_area} {selected_genre} {bpm} BPM" for _ in range(5)] # Adjust the batch size (5 in this case)
 music_tensors = generate_music_tensors(descriptions, time_slider)

 # Only play the full audio for index 0
 idx = 0
 music_tensor = music_tensors[idx]
 save_music_file = save_audio(music_tensor)
 audio_filepath = f'/content/drive/MyDrive/Colab Notebooks/audio_output/audio_{idx}.wav'
 audio_file = open(audio_filepath, 'rb')
 audio_bytes = audio_file.read()

 # Play the full audio
 st.audio(audio_bytes, format='audio/wav')
 st.markdown(get_binary_file_downloader_html(audio_filepath, f'Audio_{idx}'), unsafe_allow_html=True)


if __name__ == "__main__":
 main()



Overwriting app.py


In [53]:
!streamlit run app.py & npx localtunnel --port 8501

[..................] / rollbackFailedOptional: verb npm-session 3670fb27ba980f2[0m[K
Collecting usage statistics. To deactivate, set browser.gatherUsageStats to False.
[0m
[0m
[34m[1m You can now view your Streamlit app in your browser.[0m
[0m
[34m Network URL: [0m[1mhttp://172.28.0.12:8501[0m
[34m External URL: [0m[1mhttp://34.16.174.223:8501[0m
[0m
[K[?25hnpx: installed 22 in 2.151s
your url is: https://silly-sloths-guess.loca.lt
2024-01-31 05:34:06.593717: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-31 05:34:06.593772: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-31 05:34:06.595191: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS facto