DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion Paper • 2111.14690 • Published Nov 29, 2021
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation Paper • 2406.09399 • Published Jun 13, 2024
Sparse R-CNN: End-to-End Object Detection with Learnable Proposals Paper • 2011.12450 • Published Nov 25, 2020
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation Paper • 2412.03069 • Published Dec 4, 2024 • 32
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis Paper • 2412.04431 • Published Dec 5, 2024 • 18
Liquid: Language Models are Scalable Multi-modal Generators Paper • 2412.04332 • Published Dec 5, 2024 • 2
FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation Paper • 2502.05179 • Published Feb 7 • 24
Language as Queries for Referring Video Object Segmentation Paper • 2201.00487 • Published Jan 3, 2022
UniTok: A Unified Tokenizer for Visual Generation and Understanding Paper • 2502.20321 • Published 15 days ago • 29
FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation Paper • 2502.05179 • Published Feb 7 • 24
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published Jun 10, 2024 • 70
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models Paper • 2404.13013 • Published Apr 19, 2024 • 31
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published Apr 3, 2024 • 69
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces Paper • 2312.15715 • Published Dec 25, 2023 • 21
General Object Foundation Model for Images and Videos at Scale Paper • 2312.09158 • Published Dec 14, 2023 • 12
EGC: Image Generation and Classification via a Diffusion Energy-Based Model Paper • 2304.02012 • Published Apr 4, 2023 • 1
Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling Paper • 2301.03580 • Published Jan 9, 2023 • 1