Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos Paper • 2501.04001 • Published 7 days ago • 39
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing Paper • 2412.19806 • Published Oct 8, 2024 • 1
Faithful Logical Reasoning via Symbolic Chain-of-Thought Paper • 2405.18357 • Published May 28, 2024 • 2
RetrieveGPT: Merging Prompts and Mathematical Models for Enhanced Code-Mixed Information Retrieval Paper • 2411.04752 • Published Nov 7, 2024 • 16
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding Paper • 2406.19389 • Published Jun 27, 2024 • 53