Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers Paper β’ 2408.06195 β’ Published Aug 12 β’ 63
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper β’ 2402.17764 β’ Published Feb 27 β’ 604
Orca-Math: Unlocking the potential of SLMs in Grade School Math Paper β’ 2402.14830 β’ Published Feb 16 β’ 24
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition Paper β’ 2402.15504 β’ Published Feb 23 β’ 21
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement Paper β’ 2402.14658 β’ Published Feb 22 β’ 82
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback Paper β’ 2402.01391 β’ Published Feb 2 β’ 41
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research Paper β’ 2402.00159 β’ Published Jan 31 β’ 61
Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens Paper β’ 2401.17377 β’ Published Jan 30 β’ 35
SliceGPT: Compress Large Language Models by Deleting Rows and Columns Paper β’ 2401.15024 β’ Published Jan 26 β’ 69
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities Paper β’ 2401.15071 β’ Published Jan 26 β’ 35
Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI Paper β’ 2401.14019 β’ Published Jan 25 β’ 21
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence Paper β’ 2401.14196 β’ Published Jan 25 β’ 47
Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All Paper β’ 2401.13795 β’ Published Jan 24 β’ 66
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts Paper β’ 2401.04081 β’ Published Jan 8 β’ 70
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism Paper β’ 2401.02954 β’ Published Jan 5 β’ 41
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper β’ 2312.11514 β’ Published Dec 12, 2023 β’ 257