PixelHacker: Image Inpainting with Structural and Semantic Consistency Paper • 2504.20438 • Published Apr 29 • 44
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding Paper • 2503.10596 • Published Mar 13 • 18
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices Paper • 2411.10640 • Published Nov 16, 2024 • 47
ControlAR: Controllable Image Generation with Autoregressive Models Paper • 2410.02705 • Published Oct 3, 2024 • 11
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model Paper • 2406.20076 • Published Jun 28, 2024 • 10
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning Paper • 2311.12631 • Published Nov 21, 2023 • 15