Spaces:
Runtime error
A newer version of the Streamlit SDK is available:
1.44.1
Visual Question Answering (VQA) is a task where we expect the AI to answer a question about a given image. VQA has been an active area of research for the past 4-5 years, with most datasets using natural images found online. Two examples of such datasets are: VQAv2, GQA. VQA is a particularly interesting multi-modal machine learning challenge because it has several interesting applications across several domains including healthcare chatbots, interactive-agents, etc. However, most VQA challenges or datasets deal with English-only captions and questions.
In addition, even recent approaches that have been proposed for VQA generally are obscure due to the fact that CNN-based object detectors are relatively difficult to use and more complex for feature extraction. Click on the expandable region below to see steps for FasterRCNN-based approach.