TheBloke
/

MixtralRPChat-ZLoss-AWQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

TheBloke commited on Dec 22, 2023

Commit

bd661e6

·

1 Parent(s): 7a04423

Upload README.md

Files changed (1) hide show

README.md +11 -1

README.md CHANGED Viewed

@@ -50,13 +50,23 @@ tags:
 This repo contains AWQ model files for [Charles Goddard's MixtralRPChat ZLoss](https://huggingface.co/chargoddard/MixtralRPChat-ZLoss).
 ### About AWQ
 AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.
 AWQ models are currently supported on Linux and Windows, with NVidia GPUs only. macOS users: please use GGUF models instead.
-It is supported by:
 - [Text Generation Webui](https://github.com/oobabooga/text-generation-webui) - using Loader: AutoAWQ
 - [vLLM](https://github.com/vllm-project/vllm) - version 0.2.2 or later for support for all model types.

 This repo contains AWQ model files for [Charles Goddard's MixtralRPChat ZLoss](https://huggingface.co/chargoddard/MixtralRPChat-ZLoss).
+**MIXTRAL AWQ**
+This is a Mixtral AWQ model.
+For AutoAWQ inference, please install AutoAWQ from source.
+Support via Transformers is coming soon, via this PR: https://github.com/huggingface/transformers/pull/27950 which should be merged to Transformers `main` very soon.
+Support via vLLM and TGI has not yet been confirmed.
 ### About AWQ
 AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.
 AWQ models are currently supported on Linux and Windows, with NVidia GPUs only. macOS users: please use GGUF models instead.
+AWQ models are supported by (note that note all of these may support Mixtral models yet):
 - [Text Generation Webui](https://github.com/oobabooga/text-generation-webui) - using Loader: AutoAWQ
 - [vLLM](https://github.com/vllm-project/vllm) - version 0.2.2 or later for support for all model types.