Papers
arxiv:2403.11755

Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs

Published on Mar 18, 2024
Authors:
,
,
,
,
,
,
,

Abstract

<PRE_TAG><PRE_TAG>Prompt ensembling</POST_TAG></POST_TAG> of <PRE_TAG><PRE_TAG>Large Language Model (LLM)</POST_TAG></POST_TAG> generated category-specific prompts has emerged as an effective method to enhance zero-shot recognition ability of <PRE_TAG>Vision-Language Models (VLMs)</POST_TAG>. To obtain these category-specific prompts, the present methods rely on hand-crafting the prompts to the LLMs for generating VLM prompts for the downstream tasks. However, this requires manually composing these task-specific prompts and still, they might not cover the diverse set of visual concepts and task-specific styles associated with the categories of interest. To effectively take humans out of the loop and completely automate the prompt generation process for zero-shot recognition, we propose Meta-Prompting for Visual Recognition (MPVR). Taking as input only minimal information about the target task, in the form of its short natural language description, and a list of associated class labels, MPVR automatically produces a diverse set of category-specific prompts resulting in a strong zero-shot classifier. MPVR generalizes effectively across various popular zero-shot image recognition benchmarks belonging to widely different domains when tested with multiple LLMs and VLMs. For example, MPVR obtains a zero-shot recognition improvement over <PRE_TAG>CLIP</POST_TAG> by up to 19.8% and 18.2% (5.0% and 4.5% on average over 20 datasets) leveraging <PRE_TAG>GPT</POST_TAG> and <PRE_TAG>Mixtral LLMs</POST_TAG>, respectively

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2403.11755 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2403.11755 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2403.11755 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.