|
--- |
|
datasets: |
|
- oscar |
|
- hieronymusa/MaCoCu-dataset-250k |
|
language: |
|
- cs |
|
- cr |
|
- hr |
|
- pl |
|
- sl |
|
- sk |
|
--- |
|
|
|
|
|
# Slavic T5 Base |
|
|
|
Aim of this model is to reach the best results for the Slavic laguages with Latin script. |
|
|
|
It is suitable for tasks such as: |
|
|
|
- summarization, |
|
- extractive question answering, |
|
- machine translation between slavic languages in Latin script. |
|
|
|
The model is trained on the selected parts of OSCAR corpus and MaCoCu corpus. |
|
|
|
It supports this languages: Czech, Croatian, Polish , Slovak, Slovenian, |
|
|
|
Vocabulary has 120 000 tokens, contains capital letters. |