unidisc / docs /DATA.md
aswerdlow's picture
Initial commit
131da64
|
raw
history blame
384 Bytes

See unidisc/datasets/preprocessing for instructions on how to preprocess datasets.

We support the following datasets:

  • Cambrian
  • CapsFusion
  • CC12M
  • DataComp1B
  • JourneyDB
  • LAION400M
  • MMC4
  • PixelProse

Additionally, we generated our own synthetic dataset and provide the generation scripts as well as the raw data.