FAST: Efficient Action Tokenization for Vision-Language-Action Models Paper • 2501.09747 • Published Jan 16 • 23
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training Paper • 2501.06842 • Published Jan 12 • 16
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper • 2501.03895 • Published Jan 7 • 52
Efficiently Serving LLM Reasoning Programs with Certaindex Paper • 2412.20993 • Published Dec 30, 2024 • 36
iFormer: Integrating ConvNet and Transformer for Mobile Application Paper • 2501.15369 • Published Jan 26 • 12
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models Paper • 2501.12370 • Published Jan 21 • 11
Return of the Encoder: Maximizing Parameter Efficiency for SLMs Paper • 2501.16273 • Published Jan 27 • 5
Cost-Optimal Grouped-Query Attention for Long-Context LLMs Paper • 2503.09579 • Published 12 days ago • 5
Streaming Video Question-Answering with In-context Video KV-Cache Retrieval Paper • 2503.00540 • Published 23 days ago • 1
MaxInfo: A Training-Free Key-Frame Selection Method Using Maximum Volume for Enhanced Video Understanding Paper • 2502.03183 • Published Feb 5 • 1
OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models Paper • 2503.08686 • Published 13 days ago • 19
QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension Paper • 2503.08689 • Published 13 days ago • 4
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity Paper • 2503.07677 • Published 14 days ago • 80
LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization Paper • 2503.08619 • Published 13 days ago • 20