Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -284,16 +284,17 @@ Note the result of latency (benchmark_latency) is in seconds, and serving (bench
 Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
 ## Setup
-Need to install vllm nightly to get some recent changes
-```Shell
-pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
-```
 Get vllm source code:
 ```Shell
 git clone [email protected]:vllm-project/vllm.git
 ```
 Run the benchmarks under `vllm` root folder:
 ## benchmark_latency

 Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
 ## Setup
 Get vllm source code:
 ```Shell
 git clone [email protected]:vllm-project/vllm.git
 ```
+Install vllm
+```
+VLLM_USE_PRECOMPILED=1 pip install --editable .
+```
 Run the benchmarks under `vllm` root folder:
 ## benchmark_latency