view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM 12 days ago • 339
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 8 items • Updated 28 days ago • 403
Phi-3 Collection Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 26 items • Updated Jan 8 • 565
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 208
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 354
BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations Paper • 2501.03403 • Published Jan 6 • 4
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published Dec 18, 2024 • 135
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 126
One missing piece in Vision and Language: A Survey on Comics Understanding Paper • 2409.09502 • Published Sep 14, 2024 • 25