PolyFormer: Referring Image Segmentation as Sequential Polygon Generation

Model description

PolyFormer is a unified framework for referring image segmentation (RIS) and referring expression comprehension (REC) by formulating them as a sequence-to-sequence (seq2seq) prediction problem. For more details, please refer to our paper:

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation Jiang Liu*, Hui Ding*, Zhaowei Cai, Yuting Zhang, Ravi Kumar Satzoda, Vijay Mahadevan, R. Manmatha, CVPR 2023

Training data

We pre-train PolyFormer on the REC task using Visual Genome, RefCOCO, RefCOCO+, RefCOCOg, and Flickr30k-entities, and the finetune on REC + RIS task using RefCOCO, RefCOCO+, and RefCOCOg.

PolyFormer-B: Swin-B as the visual encoder, BERT-base as the text encoder, 6 transformer encoder layers and 6 decoder layers.
PolyFormer-L: Swin-L as the visual encoder, BERT-base as the text encoder, 12 transformer encoder layers and 12 decoder layers.

Citation

If you find PolyFormer useful in your research, please cite the following paper:

@article{liu2023polyformer,
  title={PolyFormer: Referring Image Segmentation as Sequential Polygon Generation},
  author={Liu, Jiang and Ding, Hui and Cai, Zhaowei and Zhang, Yuting and Satzoda, Ravi Kumar and Mahadevan, Vijay and Manmatha, R},
  journal={arXiv preprint arXiv:2302.07387},
  year={2023}
}

koajoel
/

PolyFormer

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation

Model description

Training data

Citation

Space using koajoel/PolyFormer 1