A Simple LMM baseline with Dynamic Visual Resolution
-
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs
Paper • 2406.04334 • Published -
menglc/SliMM-Qwen2-0.5B
Image-Text-to-Text • Updated • 8 -
menglc/SliMM-DeepStackM-Qwen2-0.5B
Image-Text-to-Text • Updated • 7 -
menglc/SliMM-DeepStackE-Qwen2VL-2B
Image-Text-to-Text • Updated • 27