DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding Paper β’ 2503.12797 β’ Published 10 days ago β’ 29
Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation Paper β’ 2503.13070 β’ Published 9 days ago β’ 9
Gemma 3 Collection All versions of Google's new multimodal models in 1B, 4B, 12B, and 27B sizes. In GGUF, dynamic 4-bit and 16-bit formats. β’ 29 items β’ Updated about 9 hours ago β’ 43
Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment Paper β’ 2502.04328 β’ Published Feb 6 β’ 30
view article Article LeRobot goes to driving school: Worldβs largest open-source self-driving dataset 16 days ago β’ 68
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM 15 days ago β’ 345
Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models Paper β’ 2503.09669 β’ Published 14 days ago β’ 34
Qwen2.5-Math Collection Math-specific model series based on Qwen2.5 β’ 11 items β’ Updated Jan 14 β’ 80
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper β’ 2501.11425 β’ Published Jan 20 β’ 101
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. β’ 46 items β’ Updated 28 days ago β’ 572
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking Paper β’ 2501.09751 β’ Published Jan 16 β’ 48