Post
2835
The Cauldron is a massive collection of 50 high-quality datasets, all converted to the user/assistant format, and ready to use to fine-tune any Vision Language Model.
The Cauldron covers a wide range of tasks, including general visual question answering, counting, captioning, text transcription, document understanding, chart/figure understanding, table understanding, visual reasoning, geometry, spotting differences between 2 images or converting a screenshot to a code.
HuggingFaceM4/the_cauldron
The Cauldron covers a wide range of tasks, including general visual question answering, counting, captioning, text transcription, document understanding, chart/figure understanding, table understanding, visual reasoning, geometry, spotting differences between 2 images or converting a screenshot to a code.
HuggingFaceM4/the_cauldron