MassivelyMultilingualTTS

Running

App Files Files Community

MassivelyMultilingualTTS / Preprocessing /multilinguality /README.md

Flux9665

update to the current version

70399da 12 months ago

preview code

raw

history blame contribute delete

1.76 kB

	## Zero-Shot Approximation of Language Embeddings
	This directory contains all scripts that are needed to reproduce the meta learning for zero-shot part of our system. These scripts allow you to predict representations of languages purely based on distances between them, as measured by a variety of linguistically informed metrics, or even better, a learned combination thereof.


	### Applying zero-shot approximation to a trained model

	Use `run_zero_shot_lang_emb_injection.py` to update the language embeddings of a trained model for all languages that were not seen during training (by default, `supervised_languages.json` is used to determine which languages were seen).
	See the script for arguments that can be passed (e.g. to use a custom model path). Here is an example:
	```
	cd IMS-Toucan/
	python run_zero_shot_lang_emb_injection.py -m <model_path> -d <distance_type> -k <number_of_nearest_neighbors>
	```

	By default, the updated model is saved with a modified filename in the same directory.

	### Cached distance lookups
	In order to apply any zero-shot approximation, cache files for distance lookups are required.

	The ASP lookup file (`asp_dict.pkl`) needs to be downloaded from the release page. All other cache files are automatically generated as required when running `run_zero_shot_lang_emb_injection.py`.

	Note: While the map, tree, and inverse ASP distances are model independent, the learned distance lookup is only applicable for the model it was trained on, i.e., different Toucan models require different learned-distance lookups. If you want to apply zero-shot approximation to a new model, make sure that you are not using an outdated, pre-existing learned distance lookup, but instead train a new learned distance metric.