bene-ges commited on
Commit
8bab68f
1 Parent(s): 9a3d45b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -1
README.md CHANGED
@@ -7,4 +7,65 @@ pipeline_tag: token-classification
7
  tags:
8
  - G2P
9
  - Grapheme-to-Phoneme
10
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  tags:
8
  - G2P
9
  - Grapheme-to-Phoneme
10
+ ---
11
+
12
+ # English G2P token classification model
13
+
14
+ This is a non-autoregressive model for English grapheme-to-phoneme (G2P) conversion based on BERT architecture. It predicts phonemes in CMU format.
15
+ Initial data was built using CMUdict v0.07
16
+
17
+
18
+ ## Intended uses & limitations
19
+
20
+ The input is expected to contain english words consisting of latin letters and apostrophe, all letters separated by space.
21
+
22
+ ### How to use
23
+
24
+ Install NeMo.
25
+
26
+ Download en_g2p.nemo (this model)
27
+ ```bash
28
+ git lfs install
29
+ git clone https://huggingface.co/bene-ges/en_g2p_cmu_bert_large
30
+ ```
31
+
32
+ Run
33
+
34
+ ```bash
35
+ python ${NEMO_ROOT}/examples/nlp/text_normalization_as_tagging/normalization_as_tagging_infer.py \
36
+ pretrained_model=en_g2p_cmu_bert_large/en_g2p.nemo \
37
+ inference.from_file=input.txt \
38
+ inference.out_file=output.txt \
39
+ model.max_sequence_len=64 \
40
+ inference.batch_size=128 \
41
+ lang=en
42
+ ```
43
+
44
+ Example of input file:
45
+ ```
46
+ g e f f e r t
47
+ p r o s c r i b e d
48
+ p r o m i n e n t l y
49
+ j o c e l y n
50
+ m a r c e c a ' s
51
+ s t a n k o w s k i
52
+ m u f f l e
53
+ ```
54
+
55
+ Example of output file:
56
+ ```
57
+ G EH1 F ER0 T g e f f e r t G EH1 <DELETE> F <DELETE> ER0 T G EH1 <DELETE> F <DELETE> ER0 T PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
58
+ P R OW0 S K R AY1 B D p r o s c r i b e d P R OW0 S K R AY1 B <DELETE> D P R OW0 S K R AY1 B <DELETE> D PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
59
+ P R AA1 M AH0 N AH0 N T L IY0 p r o m i n e n t l y P R AA1 M AH0 N AH0 N T L IY0 P R AA1 M AH0 N AH0 N T L IY0 PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
60
+ JH AO1 S L IH0 N j o c e l y n JH AO1 S <DELETE> L IH0 N JH AO1 S <DELETE> L IH0 N PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
61
+ M AA0 R S EH1 K AH0 Z m a r c e c a ' s M AA0 R S EH1 K AH0 <DELETE> Z M AA0 R S EH1 K AH0 <DELETE> Z PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
62
+ S T AH0 NG K AO1 F S K IY0 s t a n k o w s k i S T AH0 NG K AO1 F S K IY0 S T AH0 NG K AO1 F S K IY0 PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
63
+ M AH1 F AH0L m u f f l e M AH1 <DELETE> F AH0_L <DELETE> M AH1 <DELETE> F AH0_L <DELETE> PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
64
+ ```
65
+
66
+ Note that the correct output tags are in the **third** column, input is in the second column.
67
+ Tags correspond to input letters in a one-to-one fashion. If you remove `<DELETE>` tag, `_`, you should get CMU-like transcription.
68
+
69
+ ### How to use for TTS
70
+
71
+