TheBloke commited on
Commit
2e0aebe
·
1 Parent(s): a68dbfd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -41
README.md CHANGED
@@ -49,9 +49,7 @@ AWQ is an efficient, accurate and blazing-fast low-bit weight quantization metho
49
 
50
  These are experimental first AWQs for the brand-new model format, Mistral.
51
 
52
- As of September 29th 2023, they are supported by AutoAWQ, and vLLM (version 0.2).
53
-
54
- To use from AutoAWQ requires installing both AutoAWQ and Transformers from Github. More details are below.
55
 
56
  <!-- description end -->
57
  <!-- repositories-available start -->
@@ -86,44 +84,6 @@ Models are released as sharded safetensors files.
86
 
87
  <!-- README_AWQ.md-provided-files end -->
88
 
89
- <!-- README_AWQ.md-use-from-vllm start -->
90
- ## Serving this model from vLLM
91
-
92
- Make sure you are using vLLM version 0.2.
93
-
94
- Documentation on installing and using vLLM [can be found here](https://vllm.readthedocs.io/en/latest/).
95
-
96
- - When using vLLM as a server, pass the `--quantization awq` parameter, for example:
97
-
98
- ```shell
99
- python3 python -m vllm.entrypoints.api_server --model TheBloke/Mistral-7B-Instruct-v0.1-AWQ --quantization awq --dtype float16
100
- ```
101
-
102
- When using vLLM from Python code, pass the `quantization=awq` parameter, for example:
103
-
104
- ```python
105
- from vllm import LLM, SamplingParams
106
-
107
- prompts = [
108
- "Hello, my name is",
109
- "The president of the United States is",
110
- "The capital of France is",
111
- "The future of AI is",
112
- ]
113
- sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
114
-
115
- llm = LLM(model="TheBloke/Mistral-7B-Instruct-v0.1-AWQ", quantization="awq", dtype="float16")
116
-
117
- outputs = llm.generate(prompts, sampling_params)
118
-
119
- # Print the outputs.
120
- for output in outputs:
121
- prompt = output.prompt
122
- generated_text = output.outputs[0].text
123
- print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
124
- ```
125
- <!-- README_AWQ.md-use-from-vllm start -->
126
-
127
  <!-- README_AWQ.md-use-from-python start -->
128
  ## How to use this AWQ model from Python code
129
 
 
49
 
50
  These are experimental first AWQs for the brand-new model format, Mistral.
51
 
52
+ As of September 29th 2023, they are only supported by AutoAWQ (version 0.1.1+)
 
 
53
 
54
  <!-- description end -->
55
  <!-- repositories-available start -->
 
84
 
85
  <!-- README_AWQ.md-provided-files end -->
86
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87
  <!-- README_AWQ.md-use-from-python start -->
88
  ## How to use this AWQ model from Python code
89