README / README.md
unsubscribe's picture
Update README.md
d760e92
metadata
title: README
emoji: 🚀
colorFrom: indigo
colorTo: pink
sdk: static
pinned: false
license: apache-2.0

Github

English | 简体中文

👋 join us on Twitter, Discord and WeChat


News 🎉

  • [2023/08] TurboMind supports 4-bit inference, 2.4x faster than FP16, the fastest open-source implementation🚀.
  • [2023/08] LMDeploy has launched on the HuggingFace Hub, providing ready-to-use 4-bit models.
  • [2023/08] LMDeploy supports 4-bit quantization using the AWQ algorithm.
  • [2023/07] TurboMind supports Llama-2 70B with GQA.
  • [2023/07] TurboMind supports Llama-2 7B/13B.
  • [2023/07] TurboMind supports tensor-parallel inference of InternLM.