File size: 384 Bytes
131da64 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
See `unidisc/datasets/preprocessing` for instructions on how to preprocess datasets.
We support the following datasets:
- Cambrian
- CapsFusion
- CC12M
- DataComp1B
- JourneyDB
- LAION400M
- MMC4
- PixelProse
Additionally, we generated our own synthetic dataset and provide the [generation scripts](unidisc/datasets/preprocessing/unidisc_dataset/README.md) as well as the raw data. |