File size: 981 Bytes
e06c4d5
 
25a79b1
 
 
e06c4d5
25a79b1
e06c4d5
 
 
25a79b1
 
 
 
e06c4d5
 
407d7de
e06c4d5
407d7de
e06c4d5
 
 
25a79b1
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
---
license: mit
language:
- ru
library_name: transformers
tags:
- text-generation-inference
---

# text-normalization-ru-new
Normalization for Russian text. Couldn't find any existing solutions (besides algorithms, don't like those) so made this.
It was designed for Silero TTS model which cant handle english and numbers for russian text to speach.

This model is a fine-tuned version of [cointegrated/rut5-small](https://huggingface.co/cointegrated/rut5-small) on https://www.kaggle.com/c/text-normalization-challenge-russian-language and additional dataset prepared by me using typical messages.

It achieves the following results on the evaluation set:
- Loss: 0.0177
- Mean Distance: 0
- Max Distance: 15

## Model description

Tiny T5 trained from scratch for normalizing Russian texts:
- translating numbers into words
- expanding abbreviations into phonetic letter combinations
- transliterating english into russian letters
- whatever else was in the dataset (see below)