InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published 28 days ago • 143
Accelerating LLM Inference with Staged Speculative Decoding Paper • 2308.04623 • Published Aug 8, 2023 • 25