metadata
license: mit
language:
- ru
library_name: transformers
tags:
- text-generation-inference
text-normalization-ru-new
Normalization for Russian text. Couldn't find any existing solutions (besides algorithms, don't like those) so made this. It was designed for Silero TTS model which cant handle english and numbers for russian text to speach.
This model is a fine-tuned version of cointegrated/rut5-small on https://www.kaggle.com/c/text-normalization-challenge-russian-language and additional dataset prepared by me using typical messages.
It achieves the following results on the evaluation set:
- Loss: 0.0177
- Mean Distance: 0
- Max Distance: 15
Model description
Tiny T5 trained from scratch for normalizing Russian texts:
- translating numbers into words
- expanding abbreviations into phonetic letter combinations
- transliterating english into russian letters
- whatever else was in the dataset (see below)