Spaces:

gradio
/

HuBERT

Runtime error

App Files Files Community

HuBERT / examples /wav2vec /unsupervised /kaldi_self_train /README.md

aliabd

full working demo

d5175d3 about 4 years ago

preview code

raw

history blame contribute delete

2.7 kB

	# Self-Training with Kaldi HMM Models
	This folder contains recipes for self-training on pseudo phone transcripts and
	decoding into phones or words with [kaldi](https://github.com/kaldi-asr/kaldi).

	To start, download and install kaldi follow its instruction, and place this
	folder in `path/to/kaldi/egs`.

	## Training
	Assuming the following has been prepared:
	- `w2v_dir`: contains features `{train,valid}.{npy,lengths}`, real transcripts `{train,valid}.${label}`, and dict `dict.${label}.txt`
	- `lab_dir`: contains pseudo labels `{train,valid}.txt`
	- `arpa_lm`: Arpa-format n-gram phone LM for decoding
	- `arpa_lm_bin`: Arpa-format n-gram phone LM for unsupervised model selection to be used with KenLM

	Set these variables in `train.sh`, as well as `out_dir`, the output directory,
	and then run it.

	The output will be:
	```
	==== WER w.r.t. real transcript (select based on unsupervised metric)
	INFO:root:./out/exp/mono/decode_valid/scoring/14.0.0.tra.txt: score 0.9178 wer 28.71% lm_ppl 24.4500 gt_wer 25.57%
	INFO:root:./out/exp/tri1/decode_valid/scoring/17.1.0.tra.txt: score 0.9257 wer 26.99% lm_ppl 30.8494 gt_wer 21.90%
	INFO:root:./out/exp/tri2b/decode_valid/scoring/8.0.0.tra.txt: score 0.7506 wer 23.15% lm_ppl 25.5944 gt_wer 15.78%
	```
	where `wer` is the word eror rate with respect to the pseudo label, `gt_wer` to
	the ground truth label, `lm_ppl` the language model perplexity of HMM prediced
	transcripts, and `score` is the unsupervised metric for model selection. We
	choose the model and the LM parameter of the one with the lowest score. In the
	example above, it is `tri2b`, `8.0.0`.


	## Decoding into Phones
	In `decode_phone.sh`, set `out_dir` the same as used in `train.sh`, set
	`dec_exp` and `dec_lmparam` to the selected model and LM parameter (e.g.
	`tri2b` and `8.0.0` in the above example). `dec_script` needs to be set
	according to `dec_exp`: for mono/tri1/tri2b, use `decode.sh`; for tri3b, use
	`decode_fmllr.sh`.

	The output will be saved at `out_dir/dec_data`


	## Decoding into Words
	`decode_word_step1.sh` prepares WFSTs for word decoding. Besides the variables
	mentioned above, set
	- `wrd_arpa_lm`: Arpa-format n-gram word LM for decoding
	- `wrd_arpa_lm_bin`: Arpa-format n-gram word LM for unsupervised model selection

	`decode_word_step1.sh` decodes the `train` and `valid` split into word and runs
	unsupervised model selection using the `valid` split. The output is like:
	```
	INFO:root:./out/exp/tri2b/decodeword_valid/scoring/17.0.0.tra.txt: score 1.8693 wer 24.97% lm_ppl 1785.5333 gt_wer 31.45%
	```

	After determining the LM parameter (`17.0.0` in the example above), set it in
	`decode_word_step2.sh` and run it. The output will be saved at
	`out_dir/dec_data_word`.