Update README.md
Browse files
README.md
CHANGED
@@ -284,16 +284,17 @@ Note the result of latency (benchmark_latency) is in seconds, and serving (bench
|
|
284 |
Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
|
285 |
|
286 |
## Setup
|
287 |
-
Need to install vllm nightly to get some recent changes
|
288 |
-
```Shell
|
289 |
-
pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
|
290 |
-
```
|
291 |
|
292 |
Get vllm source code:
|
293 |
```Shell
|
294 |
git clone [email protected]:vllm-project/vllm.git
|
295 |
```
|
296 |
|
|
|
|
|
|
|
|
|
|
|
297 |
Run the benchmarks under `vllm` root folder:
|
298 |
|
299 |
## benchmark_latency
|
|
|
284 |
Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
|
285 |
|
286 |
## Setup
|
|
|
|
|
|
|
|
|
287 |
|
288 |
Get vllm source code:
|
289 |
```Shell
|
290 |
git clone [email protected]:vllm-project/vllm.git
|
291 |
```
|
292 |
|
293 |
+
Install vllm
|
294 |
+
```
|
295 |
+
VLLM_USE_PRECOMPILED=1 pip install --editable .
|
296 |
+
```
|
297 |
+
|
298 |
Run the benchmarks under `vllm` root folder:
|
299 |
|
300 |
## benchmark_latency
|