@giux78 on Hugging Face: "Wonderful open source Italian dataset from @manalog and @ruggsea:…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

giux78

posted an update Mar 13

Post

Wonderful open source Italian dataset from @manalog and @ruggsea :

https://huggingface.co/datasets/manalog/UsenetArchiveIT

The dataset contributes to the https://huggingface.co/mii-community project, aimed at advancing the creation of Italian open-source Language Models (LLMs).🇮🇹 🤖 About 10-20 billion token, probably the best conversational open source dataset in the Italian language. 🇮🇹🇮🇹🇮🇹🇮🇹🇮🇹🇮🇹🇮🇹

ruggsea

Mar 15

Afaik, the dataset could be the biggest Italian language dataset on Hugginface and probably one of the biggest Italian text datasets ever (excluding Common Crawl based datasets)

ruggsea

Mar 15

Afaik, the dataset could be the biggest Italian language dataset on Hugginface and probably one of the biggest Italian text datasets ever (excluding Common Crawl based datasets)

In this post

giux78 Alessandro Ercolani
ruggsea Ruggero Marino Lazzaroni