FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture Paper • 2406.11030 • Published Jun 16
Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning Paper • 2406.02265 • Published Jun 4 • 6
Do Vision and Language Models Share Concepts? A Vector Space Alignment Study Paper • 2302.06555 • Published Feb 13, 2023 • 9