view article Article **Topic 24: What is Cosmos World Foundation Model Platform?** By Kseniase β’ 4 days ago β’ 6
view article Article Timm β€οΈ Transformers: Use any timm model with transformers 12 days ago β’ 34
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Paper β’ 2501.06282 β’ Published 17 days ago β’ 40
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs Paper β’ 2501.06186 β’ Published 17 days ago β’ 59
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper β’ 2501.04519 β’ Published 19 days ago β’ 249
Scaling Laws for Floating Point Quantization Training Paper β’ 2501.02423 β’ Published 23 days ago β’ 25
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models Paper β’ 2501.03262 β’ Published 24 days ago β’ 87
view article Article πΊπ¦ββ¬ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark By wolfram β’ 25 days ago β’ 39
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment Paper β’ 2412.19326 β’ Published Dec 26, 2024 β’ 18
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Paper β’ 2412.18319 β’ Published Dec 24, 2024 β’ 37
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing Paper β’ 2412.14711 β’ Published Dec 19, 2024 β’ 16
MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design Paper β’ 2412.14590 β’ Published Dec 19, 2024 β’ 14
Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents Paper β’ 2412.13194 β’ Published Dec 17, 2024 β’ 12
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Paper β’ 2408.03314 β’ Published Aug 6, 2024 β’ 54
Smaller Language Models Are Better Instruction Evolvers Paper β’ 2412.11231 β’ Published Dec 15, 2024 β’ 27
Solving math word problems with process- and outcome-based feedback Paper β’ 2211.14275 β’ Published Nov 25, 2022 β’ 8