Runtime error
A newer version of the Gradio SDK is available:
Audio Data Ownership
conda env create -n audio_ethics --file gen_audio_ethics_3.10.yml
To set up wandb, please check out this following link:
Run Encoder Attack
cd src
Task 1: Audio Completion with Diffusion Models
For this task, we use the Free Music Archive (FMA), which is a collection of royalty-free music. You can use any version of the model you wish, but we'll use the fma_large
partition for training an initial system.
Note: If librosa version is too high, have to edit line in audioldm to be fft_window = pad_center(fft_window, size=filter_length)
To preprocess FMA, configure the file with your corresponding path and run the correct preprocessing script to convert the .mp3
files to numpy (Loading in audio files during training is prohibitively slow).
- Proceprocessing for ArchiSound encoders:
nohup python -u scripts/data_processing/ > logs/process_48k_music.out &
Task 2: TTS with Diffusion Models
TTS with Diffusion (or flow) models is one approach of many that folks have been taking for SOTA TTS performance right now. In this repo, we have a model similar to Grad-TTS, with the example inference for Grad-TTS below:
To run, first you need to build the monotonic_align
cd model/monotonic_align; python build_ext --inplace; cd ../..
You possibly might have to move the generated .so file to the monotonic_align/
directory if it is generated in montonic_align/build/