# Data preparation Scripts to process [festcat](http://festcat.talp.cat/devel.php) and [google_tts](http://openslr.org/69/) datasets, to make them compatible with training of modern TTS architectures ## Requirements `sox`, `ffmpeg` ### Processing steps #### Downloads Download [festcat](http://festcat.talp.cat/devel.php) and [google_tts](http://openslr.org/69/) #### Variables definition Open the shell script `.../data_processing/process_data.sh` and modify the following fields: ```bash ### Festcat variables ### export PATH_TO_FESTCAT_SHELL='.../data_processing/festcat_processing_test.sh' # Absolute path to festcat_processing_test.sh script export PATH_TO_FESTCAT_PY='.../data_processing/extract_festcat.py' # Absolute path to extract_festcat.py script export PATH_TO_FESTCAT_DATA='.../festcat/' # Path to Festcat dataset export FESTCAT_FINAL_PATH='.../festcat_processed' # Path where preprocessed Festcat will be stored ### Google_tts variables ### export PATH_TO_GOOGLE_TTS_SHELL='.../data_processing/google_tts_processing_test.sh' # Absolute path to google_tts_processing_test.sh script export PATH_TO_GOOGLE_TTS_PY='.../data_processing/extract_google_tts.py' # Absolute path to extract_google_tts.py script export PATH_TO_GOOGLE_TTS_DATA='.../google_tts' # Path to Google TTS dataset export GOOGLE_TTS_FINAL_PATH='.../google_tts_processed' # Path where preprocessed Google TTS will be stored ### General variables ### export VCTK_FORMATER_PATH='.../data_processing/ca_multi2vckt.py' # Absolute path to ca_multi2vckt.py script export FINAL_PATH='.../multispeaker_ca_test/' # Path where preprocessed and vctk formatted datasets will be stored. ``` #### Run preprocessing Once the variables are correctly defined, execute the following command in the terminal: `sh <...>/data_processing/process_data.sh` The processed data in vctk format will be in the directory defined in `export FINAL_PATH='.../multispeaker_ca_test/'`.