File size: 981 Bytes
e06c4d5 25a79b1 e06c4d5 25a79b1 e06c4d5 25a79b1 e06c4d5 407d7de e06c4d5 407d7de e06c4d5 25a79b1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
---
license: mit
language:
- ru
library_name: transformers
tags:
- text-generation-inference
---
# text-normalization-ru-new
Normalization for Russian text. Couldn't find any existing solutions (besides algorithms, don't like those) so made this.
It was designed for Silero TTS model which cant handle english and numbers for russian text to speach.
This model is a fine-tuned version of [cointegrated/rut5-small](https://huggingface.co/cointegrated/rut5-small) on https://www.kaggle.com/c/text-normalization-challenge-russian-language and additional dataset prepared by me using typical messages.
It achieves the following results on the evaluation set:
- Loss: 0.0177
- Mean Distance: 0
- Max Distance: 15
## Model description
Tiny T5 trained from scratch for normalizing Russian texts:
- translating numbers into words
- expanding abbreviations into phonetic letter combinations
- transliterating english into russian letters
- whatever else was in the dataset (see below)
|