ALeLacheur's picture
uploading audio diffusion attacks
5a9b731

A newer version of the Gradio SDK is available: 5.23.1

Upgrade

Audio Data Ownership

Installation

conda env create -n audio_ethics --file gen_audio_ethics_3.10.yml

To set up wandb, please check out this following link: https://docs.wandb.ai/quickstart

Run Encoder Attack

cd src

python test_encoder_attack.py

Overview

Task 1: Audio Completion with Diffusion Models

For this task, we use the Free Music Archive (FMA), which is a collection of royalty-free music. You can use any version of the model you wish, but we'll use the fma_large partition for training an initial system.

Note: If librosa version is too high, have to edit line in audioldm to be fft_window = pad_center(fft_window, size=filter_length)

To preprocess FMA, configure the file with your corresponding path and run the correct preprocessing script to convert the .mp3 files to numpy (Loading in audio files during training is prohibitively slow).

  • Proceprocessing for ArchiSound encoders: nohup python -u scripts/data_processing/process_music_numpy.py > logs/process_48k_music.out &

Task 2: TTS with Diffusion Models

TTS with Diffusion (or flow) models is one approach of many that folks have been taking for SOTA TTS performance right now. In this repo, we have a model similar to Grad-TTS, with the example inference for Grad-TTS below:

Inference Figure for Grad-TTS

To run, first you need to build the monotonic_align code:

cd model/monotonic_align; python setup.py build_ext --inplace; cd ../..

You possibly might have to move the generated .so file to the monotonic_align/ directory if it is generated in montonic_align/build/.