LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers Paper β’ 2502.15007 β’ Published 6 days ago β’ 139
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation Paper β’ 2502.14846 β’ Published 6 days ago β’ 13
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper β’ 2502.11089 β’ Published 11 days ago β’ 134
view article Article What is test-time compute and how to scale it? By Kseniase and 1 other β’ 20 days ago β’ 42
view article Article Ο0 and Ο0-FAST: Vision-Language-Action Models for General Robot Control 23 days ago β’ 109
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training Paper β’ 2501.18511 β’ Published 27 days ago β’ 19
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling Paper β’ 2501.16975 β’ Published 29 days ago β’ 26
view article Article **Topic 24: What is Cosmos World Foundation Model Platform?** By Kseniase and 1 other β’ Jan 23 β’ 6
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Paper β’ 2501.06282 β’ Published Jan 10 β’ 46
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs Paper β’ 2501.06186 β’ Published Jan 10 β’ 61
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper β’ 2501.04519 β’ Published Jan 8 β’ 258
Scaling Laws for Floating Point Quantization Training Paper β’ 2501.02423 β’ Published Jan 5 β’ 26
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models Paper β’ 2501.03262 β’ Published Jan 4 β’ 90
view article Article πΊπ¦ββ¬ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark By wolfram β’ Jan 2 β’ 40
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment Paper β’ 2412.19326 β’ Published Dec 26, 2024 β’ 18
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Paper β’ 2412.18319 β’ Published Dec 24, 2024 β’ 37