Datasets with an image, a prompt question (like "describe this image") and an answer Can be used to train VLMs.