Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
ingyu 's Collections
Inference Optimization
Model Compression
Quantization
PEFT

Inference Optimization

updated Aug 7, 2024
Upvote
-

  • The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines

    Paper • 2408.01050 • Published Aug 2, 2024 • 9

  • Efficient Inference of Vision Instruction-Following Models with Elastic Cache

    Paper • 2407.18121 • Published Jul 25, 2024 • 17

  • LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference

    Paper • 2407.14057 • Published Jul 19, 2024 • 47

  • Q-Sparse: All Large Language Models can be Fully Sparsely-Activated

    Paper • 2407.10969 • Published Jul 15, 2024 • 23

  • Inference Performance Optimization for Large Language Models on CPUs

    Paper • 2407.07304 • Published Jul 10, 2024 • 54
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs