view article Article Introducing smolagents: simple agents that write actions in code. Dec 31, 2024 ā¢ 586
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper ā¢ 2501.12948 ā¢ Published 19 days ago ā¢ 309
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs Paper ā¢ 2412.21187 ā¢ Published Dec 30, 2024 ā¢ 37
PixMo Collection A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog ā¢ 9 items ā¢ Updated Jan 6 ā¢ 56
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Paper ā¢ 2412.18319 ā¢ Published Dec 24, 2024 ā¢ 37
Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper ā¢ 2412.16145 ā¢ Published Dec 20, 2024 ā¢ 38
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper ā¢ 2412.03555 ā¢ Published Dec 4, 2024 ā¢ 126
SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance Paper ā¢ 2412.02687 ā¢ Published Dec 3, 2024 ā¢ 109
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper ā¢ 2412.04424 ā¢ Published Dec 5, 2024 ā¢ 59
LLMs Do Not Think Step-by-step In Implicit Reasoning Paper ā¢ 2411.15862 ā¢ Published Nov 24, 2024 ā¢ 8
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge Paper ā¢ 2411.16594 ā¢ Published Nov 25, 2024 ā¢ 37
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? Paper ā¢ 2411.16489 ā¢ Published Nov 25, 2024 ā¢ 42
view article Article Decoding Strategies in Large Language Models By mlabonne ā¢ Oct 29, 2024 ā¢ 40
Llama-3.1-Nemotron-70B Collection SOTA models on Arena Hard and RewardBench as of 1 Oct 2024. ā¢ 6 items ā¢ Updated 24 days ago ā¢ 152