Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding Paper • 2410.01699 • Published Oct 2 • 18
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression Paper • 2406.14909 • Published Jun 21 • 14
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression Paper • 2406.14909 • Published Jun 21 • 14
FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs Paper • 2401.03868 • Published Jan 8 • 1
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression Paper • 2406.14909 • Published Jun 21 • 14
DiTFastAttn: Attention Compression for Diffusion Transformer Models Paper • 2406.08552 • Published Jun 12 • 23
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation Paper • 2406.02540 • Published Jun 4 • 2
MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization Paper • 2405.17873 • Published May 28 • 2
DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis Paper • 2405.14224 • Published May 23 • 13
A Unified Sampling Framework for Solver Searching of Diffusion Probabilistic Models Paper • 2312.07243 • Published Dec 12, 2023