Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published Jun 10 • 65
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models Paper • 2404.13013 • Published Apr 19 • 30
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published Apr 3 • 65
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces Paper • 2312.15715 • Published Dec 25, 2023 • 19
General Object Foundation Model for Images and Videos at Scale Paper • 2312.09158 • Published Dec 14, 2023 • 8
EGC: Image Generation and Classification via a Diffusion Energy-Based Model Paper • 2304.02012 • Published Apr 4, 2023 • 1
Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling Paper • 2301.03580 • Published Jan 9, 2023 • 1
Exploring Transformers for Open-world Instance Segmentation Paper • 2308.04206 • Published Aug 8, 2023 • 1
MetaFormer: A Unified Meta Framework for Fine-Grained Recognition Paper • 2203.02751 • Published Mar 5, 2022 • 1
ByteTrack: Multi-Object Tracking by Associating Every Detection Box Paper • 2110.06864 • Published Oct 13, 2021
MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation Paper • 2304.09801 • Published Apr 19, 2023
Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations Paper • 2202.07800 • Published Feb 16, 2022
Speed Co-Augmentation for Unsupervised Audio-Visual Pre-training Paper • 2309.13942 • Published Sep 25, 2023 • 1
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation Paper • 2403.04692 • Published Mar 7 • 39