pop2piano / README.md

files and README updated

fca54f3 over 1 year ago

4.92 kB

	---
	# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
	# Doc / guide: https://huggingface.co/docs/hub/model-cards
	{}
	---

	# POP2PIANO

	Pop2Piano, a Transformer network that generates piano covers given waveforms of pop
	music.
	# Model Details

	Pop2Piano was proposed in the paper [Pop2Piano : Pop Audio-based Piano Cover Generation](https://arxiv.org/abs/2211.00895) by Jongho Choi and Kyogu Lee.

	Piano covers of pop music are widely enjoyed, but generating them from music is not a trivial task. It requires great
	expertise with playing piano as well as knowing different characteristics and melodies of a song. With Pop2Piano you
	can directly generate a cover from a song's audio waveform. It is the first model to directly generate a piano cover
	from pop audio without melody and chord extraction modules.

	Pop2Piano is an encoder-decoder Transformer model based on [T5](https://arxiv.org/pdf/1910.10683.pdf). The input audio
	is transformed to its waveform and passed to the encoder, which transforms it to a latent representation. The decoder
	uses these latent representations to generate token ids in an autoregressive way. Each token id corresponds to one of four
	different token types: time, velocity, note and 'special'. The token ids are then decoded to their equivalent MIDI file.

	## Model Sources

	- [Paper](https://arxiv.org/abs/2211.00895)
	- [Original Repository](https://github.com/sweetcocoa/pop2piano)
	- [HuggingFace Space Demo](https://huggingface.co/spaces/sweetcocoa/pop2piano)

	# Usage

	To use Pop2Piano, you will need to install the 🤗 Transformers library, as well as the following third party modules:

	```
	pip install https://github.com/huggingface/transformers.git
	pip install pretty-midi==0.2.9 essentia==2.1b6.dev1034 librosa scipy
	```
	Please note that you may need to restart your runtime after installation.

	## Pop music to Piano

	### Code Example

	- Using your own Audio

	```python
	>>> import librosa
	>>> from transformers import Pop2PianoForConditionalGeneration, Pop2PianoProcessor

	>>> audio, sr = librosa.load("<your_audio_file_here>", sr=44100) # feel free to change the sr to a suitable value.
	>>> model = Pop2PianoForConditionalGeneration.from_pretrained("sweetcocoa/pop2piano")
	>>> processor = Pop2PianoProcessor.from_pretrained("sweetcocoa/pop2piano")

	>>> inputs = processor(audio=audio, sampling_rate=sr, return_tensors="pt")
	>>> model_output = model.generate(input_features=inputs["input_features"], composer="composer1")
	>>> tokenizer_output = processor.batch_decode(
	... token_ids=model_output, feature_extractor_output=inputs
	... )["pretty_midi_objects"][0]
	>>> tokenizer_output.write("./Outputs/midi_output.mid")
	```

	- Audio from Hugging Face Hub

	```python
	>>> from datasets import load_dataset
	>>> from transformers import Pop2PianoForConditionalGeneration, Pop2PianoProcessor

	>>> model = Pop2PianoForConditionalGeneration.from_pretrained("sweetcocoa/pop2piano")
	>>> processor = Pop2PianoProcessor.from_pretrained("sweetcocoa/pop2piano")
	>>> ds = load_dataset("sweetcocoa/pop2piano_ci", split="test")

	>>> inputs = processor(
	... audio=ds["audio"][0]["array"], sampling_rate=ds["audio"][0]["sampling_rate"], return_tensors="pt"
	... )
	>>> model_output = model.generate(input_features=inputs["input_features"], composer="composer1")
	>>> tokenizer_output = processor.batch_decode(
	... token_ids=model_output, feature_extractor_output=inputs
	... )["pretty_midi_objects"][0]
	>>> tokenizer_output.write("./Outputs/midi_output.mid")
	```

	## Example
	Here we present an example of generated MIDI.

	- Actual Pop Music

	<audio controls>
	<source src="https://datasets-server.huggingface.co/assets/sweetcocoa/pop2piano_ci/--/sweetcocoa--pop2piano_ci/test/0/audio/audio.mp3" type="audio/mpeg">
	Your browser does not support the audio element.
	</audio>

	- Generated MIDI

	<audio controls>
	<source src="https://datasets-server.huggingface.co/assets/sweetcocoa/pop2piano_ci/--/sweetcocoa--pop2piano_ci/test/1/audio/audio.mp3" type="audio/mpeg">
	Your browser does not support the audio element.
	</audio>

	## Tips

	1. Pop2Piano is an Encoder-Decoder based model like T5.
	2. Pop2Piano can be used to generate midi-audio files for a given audio sequence.
	3. Choosing different composers in `Pop2PianoForConditionalGeneration.generate()` can lead to variety of different results.
	4. Setting the sampling rate to 44.1 kHz when loading the audio file can give good performance.
	5. Though Pop2Piano was mainly trained on Korean Pop music, it also does pretty well on other Western Pop or Hip Hop songs.


	# Citation

	BibTeX:
	```
	@misc{choi2023pop2piano,
	title={Pop2Piano : Pop Audio-based Piano Cover Generation},
	author={Jongho Choi and Kyogu Lee},
	year={2023},
	eprint={2211.00895},
	archivePrefix={arXiv},
	primaryClass={cs.SD}
	}
	```