SCBench: A KV Cache-Centric Analysis of Long-Context Methods Paper • 2412.10319 • Published 12 days ago • 8
SCBench: A KV Cache-Centric Analysis of Long-Context Methods Paper • 2412.10319 • Published 12 days ago • 8
SCBench: A KV Cache-Centric Analysis of Long-Context Methods Paper • 2412.10319 • Published 12 days ago • 8 • 2
Multimodal Latent Language Modeling with Next-Token Diffusion Paper • 2412.08635 • Published 14 days ago • 41
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval Paper • 2409.10516 • Published Sep 16 • 39
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval Paper • 2409.10516 • Published Sep 16 • 39
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval Paper • 2409.10516 • Published Sep 16 • 39 • 2
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding Paper • 2408.11049 • Published Aug 20 • 12
view article Article A failed experiment: Infini-Attention, and why we should keep trying? Aug 14 • 53
view article Article RegMix: Data Mixture as Regression for Language Model Pre-training By SivilTaram • Jul 11 • 10
view article Article MInference 1.0: 10x Faster Million Context Inference with a Single GPU By liyucheng • Jul 11 • 12