Papers
arxiv:2311.16515

Automatic Synthetic Data and Fine-grained Adaptive Feature Alignment for Composed Person Retrieval

Published on Nov 25, 2023
Authors:
,
,
,
,

Abstract

A new task and dataset for composed person retrieval combining visual and textual queries are introduced, along with a novel feature alignment framework for improved performance.

AI-generated summary

Person retrieval has attracted rising attention. Existing methods are mainly divided into two retrieval modes, namely image-only and text-only. However, they are unable to make full use of the available information and are difficult to meet diverse application requirements. To address the above limitations, we propose a new Composed Person Retrieval (CPR) task, which combines visual and textual queries to identify individuals of interest from large-scale person image databases. Nevertheless, the foremost difficulty of the CPR task is the lack of available annotated datasets. Therefore, we first introduce a scalable automatic data synthesis pipeline, which decomposes complex multimodal data generation into the creation of textual quadruples followed by identity-consistent image synthesis using fine-tuned generative models. Meanwhile, a multimodal filtering method is designed to ensure the resulting SynCPR dataset retains 1.15 million high-quality and fully synthetic triplets. Additionally, to improve the representation of composed person queries, we propose a novel Fine-grained Adaptive Feature Alignment (FAFA) framework through fine-grained dynamic alignment and masked feature reasoning. Moreover, for objective evaluation, we manually annotate the Image-Text Composed Person Retrieval (ITCPR) test set. The extensive experiments demonstrate the effectiveness of the SynCPR dataset and the superiority of the proposed FAFA framework when compared with the state-of-the-art methods. All code and data will be provided at https://github.com/Delong-liu-bupt/Composed_Person_Retrieval.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2311.16515 in a model README.md to link it from this page.

Datasets citing this paper 2

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2311.16515 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.