arxiv:2409.01437

Kvasir-VQA: A Text-Image Pair GI Tract Dataset

Published on Sep 2

· Submitted by

SushantGautam on Sep 4

#3 Paper of the day

Upvote

Authors:

Sushant Gautam ,

Cise Midoglu ,

Steven A. Hicks ,

Vajira Thambawita ,

Abstract

We introduce Kvasir-VQA, an extended dataset derived from the HyperKvasir and Kvasir-Instrument datasets, augmented with question-and-answer annotations to facilitate advanced machine learning tasks in Gastrointestinal (GI) diagnostics. This dataset comprises 6,500 annotated images spanning various GI tract conditions and surgical instruments, and it supports multiple question types including yes/no, choice, location, and numerical count. The dataset is intended for applications such as image captioning, Visual Question Answering (VQA), text-based generation of synthetic medical images, object detection, and classification. Our experiments demonstrate the dataset's effectiveness in training models for three selected tasks, showcasing significant applications in medical image analysis and diagnostics. We also present evaluation metrics for each task, highlighting the usability and versatility of our dataset. The dataset and supporting artifacts are available at https://datasets.simula.no/kvasir-vqa.

View arXiv page View PDF Add to collection

Community

SushantGautam

Paper author Paper submitter Sep 4

•

edited Sep 4

The Kvasir-VQA dataset is an extended dataset derived from the HyperKvasir and Kvasir-Instrument datasets, augmented with question-and-answer annotations. This dataset is designed to facilitate advanced machine learning tasks in gastrointestinal (GI) diagnostics, including image captioning, Visual Question Answering (VQA) and text-based generation of synthetic medical images.
https://datasets.simula.no/kvasir-vqa/