|
--- |
|
|
|
|
|
{} |
|
--- |
|
|
|
# POP2PIANO |
|
|
|
Pop2Piano, a Transformer network that generates piano covers given waveforms of pop |
|
music. |
|
# Model Details |
|
|
|
Pop2Piano was proposed in the paper [Pop2Piano : Pop Audio-based Piano Cover Generation](https://arxiv.org/abs/2211.00895) by Jongho Choi and Kyogu Lee. |
|
|
|
Piano covers of pop music are widely enjoyed, but generating them from music is not a trivial task. It requires great |
|
expertise with playing piano as well as knowing different characteristics and melodies of a song. With Pop2Piano you |
|
can directly generate a cover from a song's audio waveform. It is the first model to directly generate a piano cover |
|
from pop audio without melody and chord extraction modules. |
|
|
|
Pop2Piano is an encoder-decoder Transformer model based on [T5](https://arxiv.org/pdf/1910.10683.pdf). The input audio |
|
is transformed to its waveform and passed to the encoder, which transforms it to a latent representation. The decoder |
|
uses these latent representations to generate token ids in an autoregressive way. Each token id corresponds to one of four |
|
different token types: time, velocity, note and 'special'. The token ids are then decoded to their equivalent MIDI file. |
|
|
|
## Model Sources |
|
|
|
- [**Paper**](https://arxiv.org/abs/2211.00895) |
|
- [**Original Repository**](https://github.com/sweetcocoa/pop2piano) |
|
- [**HuggingFace Space Demo**](https://huggingface.co/spaces/sweetcocoa/pop2piano) |
|
|
|
# Usage |
|
|
|
To use Pop2Piano, you will need to install the 🤗 Transformers library, as well as the following third party modules: |
|
|
|
``` |
|
pip install https://github.com/huggingface/transformers.git |
|
pip install pretty-midi==0.2.9 essentia==2.1b6.dev1034 librosa scipy |
|
``` |
|
Please note that you may need to restart your runtime after installation. |
|
|
|
## Pop music to Piano |
|
|
|
### Code Example |
|
|
|
- Using your own Audio |
|
|
|
```python |
|
>>> import librosa |
|
>>> from transformers import Pop2PianoForConditionalGeneration, Pop2PianoProcessor |
|
|
|
>>> audio, sr = librosa.load("<your_audio_file_here>", sr=44100) # feel free to change the sr to a suitable value. |
|
>>> model = Pop2PianoForConditionalGeneration.from_pretrained("sweetcocoa/pop2piano") |
|
>>> processor = Pop2PianoProcessor.from_pretrained("sweetcocoa/pop2piano") |
|
|
|
>>> inputs = processor(audio=audio, sampling_rate=sr, return_tensors="pt") |
|
>>> model_output = model.generate(input_features=inputs["input_features"], composer="composer1") |
|
>>> tokenizer_output = processor.batch_decode( |
|
... token_ids=model_output, feature_extractor_output=inputs |
|
... )["pretty_midi_objects"][0] |
|
>>> tokenizer_output.write("./Outputs/midi_output.mid") |
|
``` |
|
|
|
- Audio from Hugging Face Hub |
|
|
|
```python |
|
>>> from datasets import load_dataset |
|
>>> from transformers import Pop2PianoForConditionalGeneration, Pop2PianoProcessor |
|
|
|
>>> model = Pop2PianoForConditionalGeneration.from_pretrained("sweetcocoa/pop2piano") |
|
>>> processor = Pop2PianoProcessor.from_pretrained("sweetcocoa/pop2piano") |
|
>>> ds = load_dataset("sweetcocoa/pop2piano_ci", split="test") |
|
|
|
>>> inputs = processor( |
|
... audio=ds["audio"][0]["array"], sampling_rate=ds["audio"][0]["sampling_rate"], return_tensors="pt" |
|
... ) |
|
>>> model_output = model.generate(input_features=inputs["input_features"], composer="composer1") |
|
>>> tokenizer_output = processor.batch_decode( |
|
... token_ids=model_output, feature_extractor_output=inputs |
|
... )["pretty_midi_objects"][0] |
|
>>> tokenizer_output.write("./Outputs/midi_output.mid") |
|
``` |
|
|
|
## Example |
|
Here we present an example of generated MIDI. |
|
|
|
- Actual Pop Music |
|
|
|
<audio controls> |
|
<source src="https://datasets-server.huggingface.co/assets/sweetcocoa/pop2piano_ci/--/sweetcocoa--pop2piano_ci/test/0/audio/audio.mp3" type="audio/mpeg"> |
|
Your browser does not support the audio element. |
|
</audio> |
|
|
|
- Generated MIDI |
|
|
|
<audio controls> |
|
<source src="https://datasets-server.huggingface.co/assets/sweetcocoa/pop2piano_ci/--/sweetcocoa--pop2piano_ci/test/1/audio/audio.mp3" type="audio/mpeg"> |
|
Your browser does not support the audio element. |
|
</audio> |
|
|
|
## Tips |
|
|
|
1. Pop2Piano is an Encoder-Decoder based model like T5. |
|
2. Pop2Piano can be used to generate midi-audio files for a given audio sequence. |
|
3. Choosing different composers in `Pop2PianoForConditionalGeneration.generate()` can lead to variety of different results. |
|
4. Setting the sampling rate to 44.1 kHz when loading the audio file can give good performance. |
|
5. Though Pop2Piano was mainly trained on Korean Pop music, it also does pretty well on other Western Pop or Hip Hop songs. |
|
|
|
|
|
# Citation |
|
|
|
**BibTeX:** |
|
``` |
|
@misc{choi2023pop2piano, |
|
title={Pop2Piano : Pop Audio-based Piano Cover Generation}, |
|
author={Jongho Choi and Kyogu Lee}, |
|
year={2023}, |
|
eprint={2211.00895}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.SD} |
|
} |
|
``` |