|
--- |
|
license: cc-by-4.0 |
|
language: |
|
- en |
|
library_name: nemo |
|
pipeline_tag: token-classification |
|
tags: |
|
- G2P |
|
- Grapheme-to-Phoneme |
|
--- |
|
|
|
# English G2P token classification model |
|
|
|
This is a non-autoregressive model for English grapheme-to-phoneme (G2P) conversion based on BERT architecture. It predicts phonemes in CMU format. |
|
Initial data was built using CMUdict v0.07 |
|
|
|
|
|
## Intended uses & limitations |
|
|
|
The input is expected to contain english words consisting of latin letters and apostrophe, all letters separated by space. |
|
|
|
### How to use |
|
|
|
Install NeMo. |
|
|
|
Download en_g2p.nemo (this model) |
|
```bash |
|
git lfs install |
|
git clone https://huggingface.co/bene-ges/en_g2p_cmu_bert_large |
|
``` |
|
|
|
Run |
|
|
|
```bash |
|
python ${NEMO_ROOT}/examples/nlp/text_normalization_as_tagging/normalization_as_tagging_infer.py \ |
|
pretrained_model=en_g2p_cmu_bert_large/en_g2p.nemo \ |
|
inference.from_file=input.txt \ |
|
inference.out_file=output.txt \ |
|
model.max_sequence_len=64 \ |
|
inference.batch_size=128 \ |
|
lang=en |
|
``` |
|
|
|
Example of input file: |
|
``` |
|
g e f f e r t |
|
p r o s c r i b e d |
|
p r o m i n e n t l y |
|
j o c e l y n |
|
m a r c e c a ' s |
|
s t a n k o w s k i |
|
m u f f l e |
|
``` |
|
|
|
Example of output file: |
|
``` |
|
G EH1 F ER0 T g e f f e r t G EH1 <DELETE> F <DELETE> ER0 T G EH1 <DELETE> F <DELETE> ER0 T PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN |
|
P R OW0 S K R AY1 B D p r o s c r i b e d P R OW0 S K R AY1 B <DELETE> D P R OW0 S K R AY1 B <DELETE> D PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN |
|
P R AA1 M AH0 N AH0 N T L IY0 p r o m i n e n t l y P R AA1 M AH0 N AH0 N T L IY0 P R AA1 M AH0 N AH0 N T L IY0 PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN |
|
JH AO1 S L IH0 N j o c e l y n JH AO1 S <DELETE> L IH0 N JH AO1 S <DELETE> L IH0 N PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN |
|
M AA0 R S EH1 K AH0 Z m a r c e c a ' s M AA0 R S EH1 K AH0 <DELETE> Z M AA0 R S EH1 K AH0 <DELETE> Z PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN |
|
S T AH0 NG K AO1 F S K IY0 s t a n k o w s k i S T AH0 NG K AO1 F S K IY0 S T AH0 NG K AO1 F S K IY0 PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN |
|
M AH1 F AH0L m u f f l e M AH1 <DELETE> F AH0_L <DELETE> M AH1 <DELETE> F AH0_L <DELETE> PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN |
|
``` |
|
|
|
Note that the correct output tags are in the **third** column, input is in the second column. |
|
Tags correspond to input letters in a one-to-one fashion. If you remove `<DELETE>` tag, and replace `_` with space, you should get CMU-like transcription. |
|
|
|
### How to use for TTS |
|
See this [script](https://github.com/bene-ges/nemo_compatible/blob/main/scripts/tts/tts_en_infer_from_cmu_phonemes.py) to run TTS directly from CMU phonemes. |
|
|
|
|