|
--- |
|
title: README |
|
emoji: 🚀 |
|
colorFrom: indigo |
|
colorTo: pink |
|
sdk: static |
|
pinned: false |
|
license: apache-2.0 |
|
--- |
|
<div align="center"> |
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/64ccdc322e592905f922a06e/VhwQtaklohkUXFWkjA-3M.png" width="450"/> |
|
|
|
[Github](https://github.com/InternLM/lmdeploy) |
|
|
|
English | [简体中文](https://github.com/InternLM/lmdeploy/blob/main/README_zh-CN.md) |
|
|
|
</div> |
|
|
|
<p align="center"> |
|
👋 join us on <a href="https://twitter.com/intern_lm" target="_blank">Twitter</a>, <a href="https://discord.gg/xa29JuW87d" target="_blank">Discord</a> and <a href="https://r.vansin.top/?r=internwx" target="_blank">WeChat</a> |
|
</p> |
|
|
|
______________________________________________________________________ |
|
|
|
## News 🎉 |
|
|
|
- \[2023/08\] TurboMind supports 4-bit inference, 2.4x faster than FP16, the fastest open-source implementation🚀. |
|
- \[2023/08\] LMDeploy has launched on the [HuggingFace Hub](https://huggingface.co/lmdeploy), providing ready-to-use 4-bit models. |
|
- \[2023/08\] LMDeploy supports 4-bit quantization using the [AWQ](https://arxiv.org/abs/2306.00978) algorithm. |
|
- \[2023/07\] TurboMind supports Llama-2 70B with GQA. |
|
- \[2023/07\] TurboMind supports Llama-2 7B/13B. |
|
- \[2023/07\] TurboMind supports tensor-parallel inference of InternLM. |
|
|
|
______________________________________________________________________ |
|
|
|
|