arxiv:2403.11755

Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs

Published on Mar 18, 2024

Authors:

Abstract

<PRE_TAG><PRE_TAG>Prompt ensembling</POST_TAG></POST_TAG> of <PRE_TAG><PRE_TAG>Large Language Model (LLM)</POST_TAG></POST_TAG> generated category-specific prompts has emerged as an effective method to enhance zero-shot recognition ability of <PRE_TAG>Vision-Language Models (VLMs)</POST_TAG>. To obtain these category-specific prompts, the present methods rely on hand-crafting the prompts to the LLMs for generating VLM prompts for the downstream tasks. However, this requires manually composing these task-specific prompts and still, they might not cover the diverse set of visual concepts and task-specific styles associated with the categories of interest. To effectively take humans out of the loop and completely automate the prompt generation process for zero-shot recognition, we propose Meta-Prompting for Visual Recognition (MPVR). Taking as input only minimal information about the target task, in the form of its short natural language description, and a list of associated class labels, MPVR automatically produces a diverse set of category-specific prompts resulting in a strong zero-shot classifier. MPVR generalizes effectively across various popular zero-shot image recognition benchmarks belonging to widely different domains when tested with multiple LLMs and VLMs. For example, MPVR obtains a zero-shot recognition improvement over <PRE_TAG>CLIP</POST_TAG> by up to 19.8% and 18.2% (5.0% and 4.5% on average over 20 datasets) leveraging <PRE_TAG>GPT</POST_TAG> and <PRE_TAG>Mixtral LLMs</POST_TAG>, respectively

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2403.11755 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2403.11755 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2403.11755 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.