The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
Abstract
Motivated by reducing the computational and storage costs of LLMs, model compression and KV cache compression have attracted much attention from researchers. However, current methods predominantly emphasize maintaining the performance of compressed LLMs, as measured by perplexity or simple accuracy on tasks of common sense knowledge QA and basic arithmetic reasoning. In this blog, we present a brief review of recent advancements in LLMs related to retrieval-augmented generation, multi-step reasoning, external tools, and computational expressivity, all of which substantially enhance LLM performance. Then, we propose a lottery LLM hypothesis suggesting that for a given LLM and task, there exists a smaller lottery LLM capable of producing the same performance as the original LLM with the assistance of multi-step reasoning and external tools. Based on the review of current progress in LLMs, we discuss and summarize the essential capabilities that the lottery LLM and KV cache compression must possess, which are currently overlooked in existing methods.
Community
Motivated by reducing the computational and storage costs of LLMs, model compression and KV cache compression have attracted much attention from researchers.
However, current methods predominantly emphasize maintaining the performance
of compressed LLMs, as measured by perplexity or simple accuracy on tasks
of common sense knowledge QA and basic arithmetic reasoning. In this blog,
we present a brief review of recent advancements in LLMs related to retrievalaugmented generation, multi-step reasoning, external tools, and computational
expressivity, all of which substantially enhance LLM performance. Then, we
propose a lottery LLM hypothesis suggesting that for a given LLM and task, there
exists a smaller lottery LLM capable of producing the same performance as the
original LLM with the assistance of multi-step reasoning and external tools. Based
on the review of current progress in LLMs, we discuss and summarize the essential
capabilities that the lottery LLM and KV cache compression must possess, which
are currently overlooked in existing methods.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference (2025)
- Can LLMs Maintain Fundamental Abilities under KV Cache Compression? (2025)
- DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance (2025)
- AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference (2025)
- LightThinker: Thinking Step-by-Step Compression (2025)
- RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression (2025)
- SVDq: 1.25-bit and 410x Key Cache Compression for LLM Attention (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper