Spaces:
Running
on
T4
Music Mixing Style Transfer
This repository includes source code and pre-trained models of the work Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects by Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Stefan Uhlich, Kyogu Lee, and Yuki Mitsufuji.
Pre-trained Models
Model | Configuration | Training Dataset |
---|---|---|
FXencoder (Φp.s.) | Used FX normalization and probability scheduling techniques for training | Trained with MUSDB18 Dataset |
MixFXcloner | Mixing style converter trained with Φp.s. | Trained with MUSDB18 Dataset |
Installation
pip install -r "requirements.txt"
Inference
Mixing Style Transfer
To run the inference code for mixing style transfer,
- Download pre-trained models above and place them under the folder named 'weights' (default)
- Prepare input and reference tracks under the folder named 'samples/style_transfer' (default) Target files should be organized as follow:
"path_to_data_directory"/"song_name_#1"/"input_file_name".wav
"path_to_data_directory"/"song_name_#1"/"reference_file_name".wav
...
"path_to_data_directory"/"song_name_#n"/"input_file_name".wav
"path_to_data_directory"/"song_name_#n"/"reference_file_name".wav
- Run 'inference/style_transfer.py'
python inference/style_transfer.py \
--ckpt_path_enc "path_to_checkpoint_of_FXencoder" \
--ckpt_path_conv "path_to_checkpoint_of_MixFXcloner" \
--target_dir "path_to_directory_containing_inference_samples"
- Outputs will be stored under the same folder to inference data directory (default)
Note: The system accepts WAV files of stereo-channeled, 44.1kHZ, and 16-bit rate. We recommend to use audio samples that are not too loud: it's better for the system to transfer these samples by reducing the loudness of mixture-wise inputs (maintaining the overall balance of each instrument).
Interpolation With 2 Different Reference Tracks
Inference code for two reference tracks is almost the same as mixing style transfer.
- Download pre-trained models above and place them under the folder named 'weights' (default)
- Prepare input and 2 reference tracks under the folder named 'samples/style_transfer' (default) Target files should be organized as follow:
"path_to_data_directory"/"song_name_#1"/"input_track_name".wav
"path_to_data_directory"/"song_name_#1"/"reference_file_name".wav
"path_to_data_directory"/"song_name_#1"/"reference_file_name_2interpolate".wav
...
"path_to_data_directory"/"song_name_#n"/"input_track_name".wav
"path_to_data_directory"/"song_name_#n"/"reference_file_name".wav
"path_to_data_directory"/"song_name_#n"/"reference_file_name_2interpolate".wav
- Run 'inference/style_transfer.py'
python inference/style_transfer.py \
--ckpt_path_enc "path_to_checkpoint_of_FXencoder" \
--ckpt_path_conv "path_to_checkpoint_of_MixFXcloner" \
--target_dir "path_to_directory_containing_inference_samples" \
--interpolation True \
--interpolate_segments "number of segments to perform interpolation"
- Outputs will be stored under the same folder to inference data directory (default)
Note: This example of interpolating 2 different reference tracks is not mentioned in the paper, but this example implies a potential for controllable style transfer using latent space.
Feature Extraction Using FXencoder
This inference code will extracts audio effects-related embeddings using our proposed FXencoder. This code will process all the .wav files under the target directory.
- Download FXencoder's pre-trained model above and place it under the folder named 'weights' (default)=
- Run 'inference/style_transfer.py'
python inference/feature_extraction.py \
--ckpt_path_enc "path_to_checkpoint_of_FXencoder" \
--target_dir "path_to_directory_containing_inference_samples"
- Outputs will be stored under the same folder to inference data directory (default)
Implementation
All the details of our system implementation are under the folder "mixing_style_transfer".
Citation
Please consider citing the work upon usage.
@article{koo2022music,
title={Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects},
author={Koo, Junghyun and Martinez-Ramirez, Marco A and Liao, Wei-Hsiang and Uhlich, Stefan and Lee, Kyogu and Mitsufuji, Yuki},
journal={arXiv preprint arXiv:2211.02247},
year={2022}
}