Spaces:

lmdeploy
/

README

Running

File size: 1,364 Bytes

efb33e9
 
 
 
 
 
 
e67707d
efb33e9
e67707d
 
efb33e9
298b99e
 
 
e67707d
 
 
 
 
 
 
 
 
 
 
d760e92
f39ab7a
 
e67707d

---
title: README
emoji: 🚀
colorFrom: indigo
colorTo: pink
sdk: static
pinned: false
license: apache-2.0
---
<div align="center">
  <img src="https://cdn-uploads.huggingface.co/production/uploads/64ccdc322e592905f922a06e/VhwQtaklohkUXFWkjA-3M.png" width="450"/>

[Github](https://github.com/InternLM/lmdeploy)

English | [简体中文](https://github.com/InternLM/lmdeploy/blob/main/README_zh-CN.md)

</div>

<p align="center">
    👋 join us on <a href="https://twitter.com/intern_lm" target="_blank">Twitter</a>, <a href="https://discord.gg/xa29JuW87d" target="_blank">Discord</a> and <a href="https://r.vansin.top/?r=internwx" target="_blank">WeChat</a>
</p>

______________________________________________________________________

## News 🎉

- \[2023/08\] TurboMind supports 4-bit inference, 2.4x faster than FP16, the fastest open-source implementation🚀.
- \[2023/08\] LMDeploy has launched on the [HuggingFace Hub](https://huggingface.co/lmdeploy), providing ready-to-use 4-bit models.
- \[2023/08\] LMDeploy supports 4-bit quantization using the [AWQ](https://arxiv.org/abs/2306.00978) algorithm.
- \[2023/07\] TurboMind supports Llama-2 70B with GQA.
- \[2023/07\] TurboMind supports Llama-2 7B/13B.
- \[2023/07\] TurboMind supports tensor-parallel inference of InternLM.

______________________________________________________________________