Spaces:

vishred18
/

Comparative-Analysis-of-Speech-Synthesis-Models

Build error

App Files Files Community

Comparative-Analysis-of-Speech-Synthesis-Models / TensorFlowTTS /examples /mfa_extraction /README.md

vishred18

Upload 364 files

d5ee97c about 2 years ago

preview code

raw

history blame contribute delete

2.21 kB

	# MFA based extraction for FastSpeech

	## Prepare
	Everything is done from main repo folder so TensorflowTTS/

	0. Optional* Modify MFA scripts to work with your language (https://montreal-forced-aligner.readthedocs.io/en/latest/pretrained_models.html)

	1. Download pretrained mfa, lexicon and run extract textgrids:

	- ```
	bash examples/mfa_extraction/scripts/prepare_mfa.sh
	```

	- ```
	python examples/mfa_extraction/run_mfa.py \
	--corpus_directory ./libritts \
	--output_directory ./mfa/parsed \
	--jobs 8
	```

	After this step, the TextGrids is allocated at `./mfa/parsed`.

	2. Extract duration from textgrid files:
	- ```
	python examples/mfa_extraction/txt_grid_parser.py \
	--yaml_path examples/fastspeech2_libritts/conf/fastspeech2libritts.yaml \
	--dataset_path ./libritts \
	--text_grid_path ./mfa/parsed \
	--output_durations_path ./libritts/durations \
	--sample_rate 24000
	```

	- Dataset structure after finish this step:
	```
	\|- TensorFlowTTS/
	\| \|- LibriTTS/
	\| \|- \|- train-clean-100/
	\| \|- \|- SPEAKERS.txt
	\| \|- \|- ...
	\| \|- dataset/
	\| \|- \|- 200/
	\| \|- \|- \|- 200_124139_000001_000000.txt
	\| \|- \|- \|- 200_124139_000001_000000.wav
	\| \|- \|- \|- ...
	\| \|- \|- 250/
	\| \|- \|- ...
	\| \|- \|- durations/
	\| \|- \|- train.txt
	\| \|- tensorflow_tts/
	\| \|- models/
	\| \|- ...
	```
	3. Optional* add your own dataset parser based on tensorflow_tts/processor/experiment/example_dataset.py ( If base processor dataset didnt match yours )

	4. Run preprocess and normalization (Step 4,5 in `examples/fastspeech2_libritts/README.MD`)

	5. Run fix mismatch to fix few frames difference in audio and duration files:

	- ```
	python examples/mfa_extraction/fix_mismatch.py \
	--base_path ./dump \
	--trimmed_dur_path ./dataset/trimmed-durations \
	--dur_path ./dataset/durations
	```

	## Problems with MFA extraction
	Looks like MFA have problems with trimmed files it works better (in my experiments) with ~100ms of silence at start and end

	Short files can get a lot of false positive like only silence extraction (LibriTTS example) so i would get only samples >2s