File size: 3,609 Bytes
618cb92
 
408d7a1
 
 
 
215db92
 
 
fa25fb6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a612586
6199366
a612586
6199366
 
 
 
 
a612586
6199366
a612586
 
6199366
a612586
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1826691
 
 
 
 
 
 
a612586
 
 
776bb68
cb3999d
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
license: cc-by-4.0
language:
- ru
library_name: nemo
pipeline_tag: token-classification
tags:
- G2P
- Grapheme-to-Phoneme
---

# Russian G2P token classification model

This is a non-autoregressive model for Russian grapheme-to-phoneme (G2P) conversion based on BERT architecture. It predicts phonemes in IPA format. 
Initial data was built using Wiktionary json from https://kaikki.org/dictionary/Russian/index.html 


## Intended uses & limitations

The input is expected to consist of cyrillic letters separated by space. Real space should be replaced to underscore(_).
Note that the model was trained on single words and some short phrases.
Though it can accept longer phrases its accuracy may degrade on them.

### How to use

Install NeMo.

Download ru_g2p.nemo (this model)
```bash
git lfs install
git clone https://huggingface.co/bene-ges/ru_g2p_ipa_bert_large
```

Run

```bash
python ${NEMO_ROOT}/examples/nlp/text_normalization_as_tagging/normalization_as_tagging_infer.py \
  pretrained_model=ru_g2p_ipa_bert_large/ru_g2p.nemo \
  inference.from_file=input.txt \
  inference.out_file=output.txt \
  model.max_sequence_len=512 \
  inference.batch_size=128 \
  lang=ru
```

Example of input file:
```
и с х о д
т р а н с н е п т у н о в ы х
т е л я т н и к о в с к о е
ц а р с к о г о
к р о с х о ф
г а н с - ю р г е н
д а р д а н е л л
```

Example of output file:
```
ɪ s x 'o t                          и с х о д                       ɪ s x 'o t                         ɪ s x 'o t                           PLAIN PLAIN PLAIN PLAIN PLAIN
t r a nʲ sʲ nʲ ɪ p t 'u n ə v ɨ x   т р а н с н е п т у н о в ы х   t r a nʲ sʲ nʲ ɪ p t 'u n ə v ɨ x   t r a nʲ sʲ nʲ ɪ p t 'u n ə v ɨ x    PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
tʲ ɪ lʲ 'æ tʲ nʲ ɪ k ə f s k ə jə   т е л я т н и к о в с к о е     tʲ ɪ lʲ 'æ tʲ nʲ ɪ k ə f s k ə jə   tʲ ɪ lʲ 'æ tʲ nʲ ɪ k ə f s k ə jə    PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
t~s 'a r s k ə v ə                  ц а р с к о г о                 t~s 'a r s k ə v ə                 t~s 'a r s k ə v ə                  PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
k r ɐ s x 'o f                      к р о с х о ф                   k r ɐ s x 'o f                     k r ɐ s x 'o f                      PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
ɡ a n s 'ju r ɡʲ ɪ n                г а н с - ю р г е н             ɡ a n s _ 'ju r ɡʲ ɪ n              ɡ a n s _ 'ju r ɡʲ ɪ n              PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
d ə r d ɐ n 'ɛ ɫ                    д а р д а н е л л               d ə r d ɐ n 'ɛ ɫ <DELETE>          d ə r d ɐ n 'ɛ ɫ <DELETE>            PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
```

Note that the correct output tags are in the **third** column, input is in the second column.
Tags correspond to input letters in a one-to-one fashion. If you remove `<DELETE>` tag, `+`, `~`, and spaces, you should get IPA-like transcription.
The model does not predict secondary stress. The primary stress is put directly before the stressed vowel. In some cases stress can be missing.

### How to use for TTS

See example of inference pipeline for G2P + FastPitch + HifiGAN in this [notebook](https://github.com/bene-ges/nemo_compatible/blob/main/notebooks/Russian_TTS_with_IPA_G2P_FastPitch_and_HifiGAN.ipynb).