|
--- |
|
license: cc-by-4.0 |
|
language: |
|
- ru |
|
library_name: nemo |
|
pipeline_tag: token-classification |
|
tags: |
|
- G2P |
|
- Grapheme-to-Phoneme |
|
--- |
|
|
|
# Russian G2P token classification model |
|
|
|
This is a non-autoregressive model for Russian grapheme-to-phoneme (G2P) conversion based on BERT architecture. It predicts phonemes in IPA format. |
|
Initial data was built using Wiktionary json from https://kaikki.org/dictionary/Russian/index.html |
|
|
|
|
|
## Intended uses & limitations |
|
|
|
The input is expected to consist of cyrillic letters separated by space. Real space should be replaced to underscore(_). |
|
Note that the model was trained on single words and some short phrases. |
|
Though it can accept longer phrases its accuracy may degrade on them. |
|
|
|
### How to use |
|
|
|
Install NeMo. |
|
|
|
Download ru_g2p.nemo (this model) |
|
```bash |
|
git lfs install |
|
git clone https://huggingface.co/bene-ges/ru_g2p_ipa_bert_large |
|
``` |
|
|
|
Run |
|
|
|
```bash |
|
python ${NEMO_ROOT}/examples/nlp/text_normalization_as_tagging/normalization_as_tagging_infer.py \ |
|
pretrained_model=ru_g2p_ipa_bert_large/ru_g2p.nemo \ |
|
inference.from_file=input.txt \ |
|
inference.out_file=output.txt \ |
|
model.max_sequence_len=512 \ |
|
inference.batch_size=128 \ |
|
lang=ru |
|
``` |
|
|
|
Example of input file: |
|
``` |
|
и с х о д |
|
т р а н с н е п т у н о в ы х |
|
т е л я т н и к о в с к о е |
|
ц а р с к о г о |
|
к р о с х о ф |
|
г а н с - ю р г е н |
|
д а р д а н е л л |
|
``` |
|
|
|
Example of output file: |
|
``` |
|
ɪ s x 'o t и с х о д ɪ s x 'o t ɪ s x 'o t PLAIN PLAIN PLAIN PLAIN PLAIN |
|
t r a nʲ sʲ nʲ ɪ p t 'u n ə v ɨ x т р а н с н е п т у н о в ы х t r a nʲ sʲ nʲ ɪ p t 'u n ə v ɨ x t r a nʲ sʲ nʲ ɪ p t 'u n ə v ɨ x PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN |
|
tʲ ɪ lʲ 'æ tʲ nʲ ɪ k ə f s k ə jə т е л я т н и к о в с к о е tʲ ɪ lʲ 'æ tʲ nʲ ɪ k ə f s k ə jə tʲ ɪ lʲ 'æ tʲ nʲ ɪ k ə f s k ə jə PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN |
|
t~s 'a r s k ə v ə ц а р с к о г о t~s 'a r s k ə v ə t~s 'a r s k ə v ə PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN |
|
k r ɐ s x 'o f к р о с х о ф k r ɐ s x 'o f k r ɐ s x 'o f PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN |
|
ɡ a n s 'ju r ɡʲ ɪ n г а н с - ю р г е н ɡ a n s _ 'ju r ɡʲ ɪ n ɡ a n s _ 'ju r ɡʲ ɪ n PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN |
|
d ə r d ɐ n 'ɛ ɫ д а р д а н е л л d ə r d ɐ n 'ɛ ɫ <DELETE> d ə r d ɐ n 'ɛ ɫ <DELETE> PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN |
|
``` |
|
|
|
Note that the correct output tags are in the **third** column, input is in the second column. |
|
Tags correspond to input letters in a one-to-one fashion. If you remove `<DELETE>` tag, `+`, `~`, and spaces, you should get IPA-like transcription. |
|
The model does not predict secondary stress. The primary stress is put directly before the stressed vowel. In some cases stress can be missing. |
|
|
|
### How to use for TTS |
|
|
|
See example of inference pipeline for G2P + FastPitch + HifiGAN in this [notebook](https://github.com/bene-ges/nemo_compatible/blob/main/notebooks/Russian_TTS_with_IPA_G2P_FastPitch_and_HifiGAN.ipynb). |
|
|