Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding Paper • 2307.15337 • Published Jul 28, 2023 • 36
DiTFastAttn: Attention Compression for Diffusion Transformer Models Paper • 2406.08552 • Published Jun 12 • 23
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation Paper • 2406.02540 • Published Jun 4 • 2
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression Paper • 2406.14909 • Published Jun 21 • 14
MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization Paper • 2405.17873 • Published May 28 • 2
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching Paper • 2412.17153 • Published 5 days ago • 29