File size: 2,825 Bytes
f23051b
 
9a3d45b
 
 
 
 
 
 
8bab68f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67ec4d1
e1dd681
67ec4d1
 
e1dd681
67ec4d1
e1dd681
8bab68f
 
 
9fb2568
8bab68f
 
9fb82b9
8bab68f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
license: cc-by-4.0
language:
- en
library_name: nemo
pipeline_tag: token-classification
tags:
- G2P
- Grapheme-to-Phoneme
---

# English G2P token classification model

This is a non-autoregressive model for English grapheme-to-phoneme (G2P) conversion based on BERT architecture. It predicts phonemes in CMU format. 
Initial data was built using CMUdict v0.07 


## Intended uses & limitations

The input is expected to contain english words consisting of latin letters and apostrophe, all letters separated by space.

### How to use

Install NeMo.

Download en_g2p.nemo (this model)
```bash
git lfs install
git clone https://huggingface.co/bene-ges/en_g2p_cmu_bert_large
```

Run

```bash
python ${NEMO_ROOT}/examples/nlp/text_normalization_as_tagging/normalization_as_tagging_infer.py \
  pretrained_model=en_g2p_cmu_bert_large/en_g2p.nemo \
  inference.from_file=input.txt \
  inference.out_file=output.txt \
  model.max_sequence_len=64 \
  inference.batch_size=128 \
  lang=en
```

Example of input file:
```
g e f f e r t
p r o s c r i b e d
p r o m i n e n t l y
j o c e l y n
m a r c e c a ' s
s t a n k o w s k i
m u f f l e
```

Example of output file:
```
G EH1  F  ER0 T	               g e f f e r t           G EH1 <DELETE> F <DELETE> ER0 T   G EH1 <DELETE> F <DELETE> ER0 T   PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
P R OW0 S K R AY1 B  D         p r o s c r i b e d	   P R OW0 S K R AY1 B <DELETE> D    P R OW0 S K R AY1 B <DELETE> D    PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
P R AA1 M AH0 N AH0 N T L IY0  p r o m i n e n t l y   P R AA1 M AH0 N AH0 N T L IY0     P R AA1 M AH0 N AH0 N T L IY0     PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
JH AO1 S  L IH0 N              j o c e l y n           JH AO1 S <DELETE> L IH0 N         JH AO1 S <DELETE> L IH0 N         PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
M AA0 R S EH1 K AH0  Z         m a r c e c a ' s       M AA0 R S EH1 K AH0 <DELETE> Z	 M AA0 R S EH1 K AH0 <DELETE> Z    PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
S T AH0 NG K AO1 F S K IY0     s t a n k o w s k i     S T AH0 NG K AO1 F S K IY0        S T AH0 NG K AO1 F S K IY0        PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
M AH1  F AH0L                  m u f f l e	           M AH1 <DELETE> F AH0_L <DELETE>   M AH1 <DELETE> F AH0_L <DELETE>   PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
```

Note that the correct output tags are in the **third** column, input is in the second column.
Tags correspond to input letters in a one-to-one fashion. If you remove `<DELETE>` tag, and replace `_` with space, you should get CMU-like transcription.

### How to use for TTS
See this [script](https://github.com/bene-ges/nemo_compatible/blob/main/scripts/tts/tts_en_infer_from_cmu_phonemes.py) to run TTS directly from CMU phonemes.