Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference Paper • 2402.09398 • Published Feb 14, 2024
BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation Paper • 2205.13542 • Published May 26, 2022
FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer Paper • 2301.08739 • Published Jan 20, 2023
It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF Paper • 2406.07971 • Published Jun 12, 2024
S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity Paper • 2412.06289 • Published Dec 9, 2024