Mark Kurtz's picture

5 2

Mark Kurtz

markurtz

·

AI & ML interests

None yet

Recent Activity

new activity 24 days ago

neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8-dynamic:What is the difference between this and "neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8"

View all activity

Organizations

markurtz's activity

New activity in neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8-dynamic 24 days ago

What is the difference between this and "neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8"

#1 opened 25 days ago by

commented a paper 2 months ago

"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization

Paper • 2411.02355 • Published Nov 4, 2024 • 46 •

authored 4 papers 2 months ago

How Well Do Sparse Imagenet Models Transfer?

Paper • 2111.13445 • Published Nov 26, 2021 • 1

The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

Paper • 2203.07259 • Published Mar 14, 2022 • 3

Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment

Paper • 2405.03594 • Published May 6, 2024 • 7

"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization

Paper • 2411.02355 • Published Nov 4, 2024 • 46

upvoted a paper 2 months ago

"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization

Paper • 2411.02355 • Published Nov 4, 2024 • 46

New activity in neuralmagic/Llama-3.1-Nemotron-70B-Instruct-HF-FP8-dynamic 3 months ago

Can you please add Nemotron 70B static?

#1 opened 3 months ago by

updated a collection 5 months ago

Compressed LLMs from the Community

LLMs optimized by the community using Neural Magic's LLM Compressor for efficient deployment in vLLM. Contribute and help advance efficient AI! • 3 items • Updated Sep 26, 2024 • 2

upvoted a collection 5 months ago

Compressed LLMs from the Community

LLMs optimized by the community using Neural Magic's LLM Compressor for efficient deployment in vLLM. Contribute and help advance efficient AI! • 3 items • Updated Sep 26, 2024 • 2

updated 2 collections 5 months ago

Compressed LLMs from the Community

LLMs optimized by the community using Neural Magic's LLM Compressor for efficient deployment in vLLM. Contribute and help advance efficient AI! • 3 items • Updated Sep 26, 2024 • 2

FP8 LLMs for vLLM

Accurate FP8 quantized models by Neural Magic, ready for use with vLLM! • 44 items • Updated Oct 17, 2024 • 61

New activity in neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16 6 months ago

Language support

#1 opened 6 months ago by