Collections
Discover the best community collections!
Collections including paper arxiv:2410.20474
-
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
Paper • 2410.21220 • Published • 10 -
LongReward: Improving Long-context Large Language Models with AI Feedback
Paper • 2410.21252 • Published • 17 -
GrounDiT: Grounding Diffusion Transformers via Noisy Patch Transplantation
Paper • 2410.20474 • Published • 14 -
24🥇
EU AI Act Compliance Leaderboard
-
OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction
Paper • 2410.04932 • Published • 9 -
ControlAR: Controllable Image Generation with Autoregressive Models
Paper • 2410.02705 • Published • 9 -
MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models
Paper • 2410.13370 • Published • 35 -
GrounDiT: Grounding Diffusion Transformers via Noisy Patch Transplantation
Paper • 2410.20474 • Published • 14
-
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models
Paper • 2406.09416 • Published • 27 -
Wavelets Are All You Need for Autoregressive Image Generation
Paper • 2406.19997 • Published • 29 -
ViPer: Visual Personalization of Generative Models via Individual Preference Learning
Paper • 2407.17365 • Published • 11 -
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning
Paper • 2408.11001 • Published • 11
-
LocalMamba: Visual State Space Model with Windowed Selective Scan
Paper • 2403.09338 • Published • 7 -
GiT: Towards Generalist Vision Transformer through Universal Language Interface
Paper • 2403.09394 • Published • 25 -
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Paper • 2402.19479 • Published • 32 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 26