|
--- |
|
license: mit |
|
language: |
|
- ru |
|
library_name: transformers |
|
tags: |
|
- text-generation-inference |
|
--- |
|
|
|
# text-normalization-ru-new |
|
Normalization for Russian text. Couldn't find any existing solutions (besides algorithms, don't like those) so made this. |
|
It was designed for Silero TTS model which cant handle english and numbers for russian text to speach. |
|
|
|
This model is a fine-tuned version of [cointegrated/rut5-small](https://huggingface.co/cointegrated/rut5-small) on https://www.kaggle.com/c/text-normalization-challenge-russian-language and additional dataset prepared by me using typical messages. |
|
|
|
It achieves the following results on the evaluation set: |
|
- Loss: 0.0177 |
|
- Mean Distance: 0 |
|
- Max Distance: 15 |
|
|
|
## Model description |
|
|
|
Tiny T5 trained from scratch for normalizing Russian texts: |
|
- translating numbers into words |
|
- expanding abbreviations into phonetic letter combinations |
|
- transliterating english into russian letters |
|
- whatever else was in the dataset (see below) |
|
|