-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper β’ 2403.09611 β’ Published β’ 125 -
Evolutionary Optimization of Model Merging Recipes
Paper β’ 2403.13187 β’ Published β’ 50 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper β’ 2402.03766 β’ Published β’ 12 -
LLM Agent Operating System
Paper β’ 2403.16971 β’ Published β’ 65
Collections
Discover the best community collections!
Collections including paper arxiv:2402.05472
-
Question Aware Vision Transformer for Multimodal Reasoning
Paper β’ 2402.05472 β’ Published β’ 8 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper β’ 2402.04615 β’ Published β’ 39 -
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Paper β’ 2402.05930 β’ Published β’ 38 -
More Agents Is All You Need
Paper β’ 2402.05120 β’ Published β’ 51
-
Attention Is All You Need
Paper β’ 1706.03762 β’ Published β’ 49 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper β’ 2307.08691 β’ Published β’ 8 -
Mixtral of Experts
Paper β’ 2401.04088 β’ Published β’ 158 -
Mistral 7B
Paper β’ 2310.06825 β’ Published β’ 47
-
Kosmos-2.5: A Multimodal Literate Model
Paper β’ 2309.11419 β’ Published β’ 50 -
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
Paper β’ 2311.05698 β’ Published β’ 9 -
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Paper β’ 2311.06242 β’ Published β’ 86 -
PolyMaX: General Dense Prediction with Mask Transformer
Paper β’ 2311.05770 β’ Published β’ 6