Collections
Discover the best community collections!
Collections trending this week
-
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper • 2402.04291 • Published • 49 -
OneBit: Towards Extremely Low-bit Large Language Models
Paper • 2402.11295 • Published • 24 -
A Survey on Transformer Compression
Paper • 2402.05964 • Published -
Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers
Paper • 2402.08958 • Published • 6
-
Direct Language Model Alignment from Online AI Feedback
Paper • 2402.04792 • Published • 31 -
Suppressing Pink Elephants with Direct Principle Feedback
Paper • 2402.07896 • Published • 11 -
Reformatted Alignment
Paper • 2402.12219 • Published • 18 -
Self-Play Preference Optimization for Language Model Alignment
Paper • 2405.00675 • Published • 27
-
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Paper • 2402.05099 • Published • 20 -
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting
Paper • 2402.13720 • Published • 7 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 32 -
Your Transformer is Secretly Linear
Paper • 2405.12250 • Published • 153