"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization Paper • 2411.02355 • Published Nov 4, 2024 • 49
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization Paper • 2411.02355 • Published Nov 4, 2024 • 49 • 3
MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence Paper • 2405.15593 • Published May 24, 2024 • 1
Panza: A Personalized Text Writing Assistant via Data Playback and Local Fine-Tuning Paper • 2407.10994 • Published Jun 24, 2024 • 2
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization Paper • 2411.02355 • Published Nov 4, 2024 • 49
EvoPress: Towards Optimal Dynamic Model Compression via Evolutionary Search Paper • 2410.14649 • Published Oct 18, 2024 • 9
neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8-dynamic Text Generation • Updated Oct 19, 2024 • 7.57k • 5
neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8-dynamic Text Generation • Updated Oct 19, 2024 • 4.94k • 6
neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8-dynamic Text Generation • Updated Oct 19, 2024 • 174 • 14