TUKE-KEMT
/

slavic-t5-base

Text2Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

slavic-t5-base / README.md

dhladek's picture

Update README.md

283a5c9 verified 8 months ago

|

571 Bytes

	---
	datasets:
	- oscar
	- hieronymusa/MaCoCu-dataset-250k
	language:
	- cs
	- cr
	- hr
	- pl
	- sl
	- sk
	---


	# Slavic T5 Base

	Aim of this model is to reach the best results for the Slavic laguages with Latin script.

	It is suitable for tasks such as:

	- summarization,
	- extractive question answering,
	- machine translation between slavic languages in Latin script.

	The model is trained on the selected parts of OSCAR corpus and MaCoCu corpus.

	It supports this languages: Czech, Croatian, Polish , Slovak, Slovenian,

	Vocabulary has 120 000 tokens, contains capital letters.