Zhen Zheng's picture

2 2

Zhen Zheng

JamesTheZ

·

https://jamesthez.github.io/

JamesTheZ

AI & ML interests

Large scale machine learning system optimization.

Recent Activity

commented a paper 19 days ago

MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design

authored a paper 20 days ago

MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design

authored a paper 20 days ago

BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching

View all activity

Organizations

None yet

JamesTheZ's activity

commented a paper 19 days ago

MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design

Paper • 2412.14590 • Published 24 days ago • 13 •

authored 2 papers 20 days ago

MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design

Paper • 2412.14590 • Published 24 days ago • 13

BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching

Paper • 2412.03594 • Published Nov 29, 2024

upvoted a paper 20 days ago

MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design

Paper • 2412.14590 • Published 24 days ago • 13

commented a paper 20 days ago

MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design

Paper • 2412.14590 • Published 24 days ago • 13 •

commented a paper 12 months ago

FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design

Paper • 2401.14112 • Published Jan 25, 2024 • 18 •

upvoted a paper 12 months ago

FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design

Paper • 2401.14112 • Published Jan 25, 2024 • 18

commented a paper 12 months ago

FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design

Paper • 2401.14112 • Published Jan 25, 2024 • 18 •

authored 3 papers 12 months ago

Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity

Paper • 2309.10285 • Published Sep 19, 2023 • 1

ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks

Paper • 2312.08583 • Published Dec 14, 2023 • 9

FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design

Paper • 2401.14112 • Published Jan 25, 2024 • 18