Spaces:

awacke1
/

MusicGenStreamFacebook

Paused

App Files Files Community

awacke1 commited on Oct 10, 2023

Commit

feefbf0

1 Parent(s): 1b1197b

Update app.py

Browse files

Files changed (1) hide show

app.py +4 -23

app.py CHANGED Viewed

@@ -1,22 +1,16 @@
 from queue import Queue
 from threading import Thread
 from typing import Optional
-import numpy as np
-import torch
 from transformers import MusicgenForConditionalGeneration, MusicgenProcessor, set_seed
 from transformers.generation.streamers import BaseStreamer
-import gradio as gr
-import spaces
 model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-small")
 processor = MusicgenProcessor.from_pretrained("facebook/musicgen-small")
 title = "MusicGen Streaming"
 description = """
 Stream the outputs of the MusicGen text-to-music model by playing the generated audio as soon as the first chunk is ready.
 Demo uses [MusicGen Small](https://huggingface.co/facebook/musicgen-small) in the 🤗 Transformers library. Note that the
@@ -30,18 +24,6 @@ At each decoding step, the model generates a new set of audio codes, conditional
 frame rate of the [EnCodec model](https://huggingface.co/facebook/encodec_32khz) used to decode the generated codes to audio waveform,
 each set of generated audio codes corresponds to 0.02 seconds. This means we require a total of 1000 decoding steps to generate
 20 seconds of audio.
-Rather than waiting for the entire audio sequence to be generated, which would require the full 1000 decoding steps, we can start
-playing the audio after a specified number of decoding steps have been reached, a techinque known as [*streaming*](https://huggingface.co/docs/transformers/main/en/generation_strategies#streaming).
-For example, after 250 steps we have the first 5 seconds of audio ready, and so can play this without waiting for the remaining
-750 decoding steps to be complete. As we continue to generate with the MusicGen model, we append new chunks of generated audio
-to our output waveform on-the-fly. After the full 1000 decoding steps, the generated audio is complete, and is composed of four
-chunks of audio, each corresponding to 250 tokens.
-This method of playing incremental generations reduces the latency of the MusicGen model from the total time to generate 1000 tokens,
-to the time taken to play the first chunk of audio (250 tokens). This can result in significant improvements to perceived latency,
-particularly when the chunk size is chosen to be small. In practice, the chunk size should be tuned to your device: using a
-smaller chunk size will mean that the first chunk is ready faster, but should not be chosen so small that the model generates slower
-than the time it takes to play the audio.
-For details on how the streaming class works, check out the source code for the [MusicgenStreamer](https://huggingface.co/spaces/sanchit-gandhi/musicgen-streaming/blob/main/app.py#L52).
 """
@@ -229,5 +211,4 @@ demo = gr.Interface(
     cache_examples=False,
 )
 demo.queue().launch()

+import numpy as np
+import torch
+import gradio as gr
+import spaces
 from queue import Queue
 from threading import Thread
 from typing import Optional
 from transformers import MusicgenForConditionalGeneration, MusicgenProcessor, set_seed
 from transformers.generation.streamers import BaseStreamer
 model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-small")
 processor = MusicgenProcessor.from_pretrained("facebook/musicgen-small")
 title = "MusicGen Streaming"
 description = """
 Stream the outputs of the MusicGen text-to-music model by playing the generated audio as soon as the first chunk is ready.
 Demo uses [MusicGen Small](https://huggingface.co/facebook/musicgen-small) in the 🤗 Transformers library. Note that the
 frame rate of the [EnCodec model](https://huggingface.co/facebook/encodec_32khz) used to decode the generated codes to audio waveform,
 each set of generated audio codes corresponds to 0.02 seconds. This means we require a total of 1000 decoding steps to generate
 20 seconds of audio.
 """
     cache_examples=False,
 )
 demo.queue().launch()