VictorSanh
commited on
Commit
·
4a3d381
1
Parent(s):
5215c66
Update README.md
Browse files
README.md
CHANGED
@@ -122,7 +122,7 @@ The model is trained on the following data mixture of openly accessible English
|
|
122 |
| [LAION](https://huggingface.co/datasets/laion/laion2B-en) | Image-Text Pairs | 29.9B | 1.120B | 1 | 17.18%
|
123 |
| [PMD](https://huggingface.co/datasets/facebook/pmd) | Image-Text Pairs | 1.6B | 70M | 3 | 2.82% | |
|
124 |
|
125 |
-
**OBELICS** is an open, massive and curated collection of interleaved image-text web documents, containing 141M documents, 115B text tokens and 353M images. An interactive visualization of the dataset content is available [here](TODO).
|
126 |
|
127 |
**Wkipedia** is the multimodal equivalent of the encyclopedia. We used the English dump of Wikipedia created on February 20th, 2023.
|
128 |
|
|
|
122 |
| [LAION](https://huggingface.co/datasets/laion/laion2B-en) | Image-Text Pairs | 29.9B | 1.120B | 1 | 17.18%
|
123 |
| [PMD](https://huggingface.co/datasets/facebook/pmd) | Image-Text Pairs | 1.6B | 70M | 3 | 2.82% | |
|
124 |
|
125 |
+
**OBELICS** is an open, massive and curated collection of interleaved image-text web documents, containing 141M documents, 115B text tokens and 353M images. An interactive visualization of the dataset content is available [here](TODO). (https://atlas.nomic.ai/map/259c207e-a228-445b-af77-281c84f8bd52/1211f37e-6c31-4dab-80ba-fdb02dfc1a51 -> this is an early, non-final version)
|
126 |
|
127 |
**Wkipedia** is the multimodal equivalent of the encyclopedia. We used the English dump of Wikipedia created on February 20th, 2023.
|
128 |
|