Update README.md
Browse files
README.md
CHANGED
@@ -147,6 +147,13 @@ This model is most easily served with [OpenChat's](https://github.com/imoneoi/op
|
|
147 |
This is highly recommended as it is by far the fastest in terms of inference speed and is a quick and easy option for setup.
|
148 |
We also illustrate setup of Oobabooga/text-generation-webui below. The settings outlined there will also apply to other uses of `Transformers`.
|
149 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
150 |
|
151 |
## Serving with OpenChat
|
152 |
|
|
|
147 |
This is highly recommended as it is by far the fastest in terms of inference speed and is a quick and easy option for setup.
|
148 |
We also illustrate setup of Oobabooga/text-generation-webui below. The settings outlined there will also apply to other uses of `Transformers`.
|
149 |
|
150 |
+
## Serving Quantized
|
151 |
+
|
152 |
+
Pre-quantized models are now available courtesy of our friend TheBloke:
|
153 |
+
|
154 |
+
* **GGML**: https://huggingface.co/TheBloke/OpenOrcaxOpenChat-Preview2-13B-GGML
|
155 |
+
* **GPTQ**: https://huggingface.co/TheBloke/OpenOrcaxOpenChat-Preview2-13B-GPTQ
|
156 |
+
|
157 |
|
158 |
## Serving with OpenChat
|
159 |
|