Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
mpasila 's Collections
Finnish fine-tunes
Japanese2English datasets
ExLlamaV2 quantizations
Finnish Instruct Datasets
Pre-training dataset prep
Magnum used datasets

Pre-training dataset prep

updated Oct 26, 2024

Some datasets I should probably use.

Upvote
-

  • JeanKaddour/minipile

    Viewer • Updated Jun 20, 2023 • 1.01M • 1.86k • 125

  • wikimedia/wikipedia

    Viewer • Updated Jan 9, 2024 • 61.6M • 37.2k • 892

  • neuralwork/arxiver

    Viewer • Updated Nov 1, 2024 • 63.4k • 145 • 363

  • ohsuz/tiny-textbooks-edu

    Viewer • Updated Jun 11, 2024 • 3.31M • 15 • 1

  • ohsuz/tiny-code-textbooks-edu

    Viewer • Updated Jun 11, 2024 • 1.84M • 3 • 2
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs