Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces Paper • 2412.14171 • Published 7 days ago • 22
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions Paper • 2411.14405 • Published Nov 21 • 58
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation Paper • 2412.10704 • Published 12 days ago • 14
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Paper • 1810.04805 • Published Oct 11, 2018 • 16
Bamba Collection Collection of Bamba - hybrid Mamba2 model architecture based models trained on open data • 8 items • Updated 8 days ago • 16
Navarasa Collection Collection of Gemma finetuned 7B/ 2B Indic Navarasa models. • 4 items • Updated Mar 18 • 2
Navarasa 2.0 Models Collection Collection of models Navarasa 2.0 Models finetuned with Gemma on 15 Indian languages • 5 items • Updated Mar 18 • 17
Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions Paper • 2401.01827 • Published Jan 3 • 16
EXAONE 3.5: Series of Large Language Models for Real-world Use Cases Paper • 2412.04862 • Published 20 days ago • 48
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Paper • 2412.04454 • Published 20 days ago • 49
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper • 2412.04424 • Published 20 days ago • 55
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs Paper • 2411.19146 • Published 28 days ago • 13
Star Attention: Efficient LLM Inference over Long Sequences Paper • 2411.17116 • Published 30 days ago • 47
MagicQuill: An Intelligent Interactive Image Editing System Paper • 2411.09703 • Published Nov 14 • 57