Post
Wonderful open source Italian dataset from
@manalog
and
@ruggsea
:
https://huggingface.co/datasets/manalog/UsenetArchiveIT
The dataset contributes to the https://huggingface.co/mii-community project, aimed at advancing the creation of Italian open-source Language Models (LLMs).๐ฎ๐น ๐ค About 10-20 billion token, probably the best conversational open source dataset in the Italian language. ๐ฎ๐น๐ฎ๐น๐ฎ๐น๐ฎ๐น๐ฎ๐น๐ฎ๐น๐ฎ๐น
https://huggingface.co/datasets/manalog/UsenetArchiveIT
The dataset contributes to the https://huggingface.co/mii-community project, aimed at advancing the creation of Italian open-source Language Models (LLMs).๐ฎ๐น ๐ค About 10-20 billion token, probably the best conversational open source dataset in the Italian language. ๐ฎ๐น๐ฎ๐น๐ฎ๐น๐ฎ๐น๐ฎ๐น๐ฎ๐น๐ฎ๐น