---
datasets:
- oscar
- hieronymusa/MaCoCu-dataset-250k
language:
- cs
- cr
- hr
- pl
- sl
- sk
---


# Slavic T5 Base

Aim of this model is to reach the best results for the Slavic laguages with Latin script.

It is suitable for tasks such as:

- summarization,
- extractive question answering,
- machine translation between slavic languages in Latin script.

The model is trained on the selected parts of OSCAR corpus and MaCoCu corpus.

It supports this languages: Czech, Croatian, Polish , Slovak, Slovenian, 

Vocabulary has 120 000 tokens, contains capital letters.